perm filename T.TEX[TEX,DEK] blob
sn#841376 filedate 1987-06-14 generic text, type T, neo UTF8
\input basic
\tracingstats=3
{\def\t{\gdef\a##}\catcode`d=12\t1d#2#3{#2}}
\hfuzz=100P\ifdim12pt=1P\expandafter\a\error
\expandafter\else\romannumeral888\relax\fi
\showthe\hfuzz \showlists
\escapechar=`\↑↑M \catcode`\↑↑I=6
\everypar{1234567890123456789012345678901↑↑Ietc.}
\tracingrestores=1 \tracingonline=1 {\everypar{}}
\bye
\font\large=cmr10
\font\small=cmr7
\noindent
\hbadness=-1
\hbox to 100pt{\discretionary{\kern10pt}{\large Uff}{!}\penalty5000
\hskip0pt plus1pt\raise5pt\vtop{\hbox{\small da}\hrule height0.5pt}}
\tracingall
\showlists
\par
\N1. \[1] Introduction.
This is \TeX, a document compiler intended to produce high-quality typesetting.
The \PASCAL\ program that follows is the definition of \TeX82, a standard
version of \TeX\ that is designed to be highly portable so that identical
output
will be obtainable on a great variety of different computers.
The main purpose of the following program is to explain the algorithms of \TeX\
as clearly as possible. As a result, the program will not necessarily be very
efficient when a particular \PASCAL\ compiler has translated it into a
particular machine language. However, the program has been written so that it
can be tuned to run efficiently in a wide variety of operating environments
by making comparatively few changes. Such flexibility is possible because
the documentation that follows is written in the \.{WEB} language, which is
at a higher level than \PASCAL; the preprocessing step that converts \.{WEB}
to \PASCAL\ is able to introduce most of the necessary refinements.
Semi-automatic translation to other languages is also feasible, because the
program below does not make extensive use of features that are peculiar to
\PASCAL.
A large piece of software like \TeX\ has inherent complexity that cannot
be reduced below a certain level of difficulty, although each individual
part is fairly simple by itself. The \.{WEB} language is intended to make
the algorithms as readable as possible, by reflecting the way the
individual program pieces fit together and by providing the
cross-references that connect different parts. Detailed comments about
what is going on, and about why things were done in certain ways, have
been liberally sprinkled throughout the program. These comments explain
features of the implementation, but they rarely attempt to explain the
\TeX\ language itself, since the reader is supposed to be familiar with
{\sl The \TeX book}.
\fi
\M2\*. The present implementation has a long ancestry, beginning in the summer
of~1977, when Michael~F. Plass and Frank~M. Liang designed and coded
a prototype
based on some specifications that the author had made in May of that year.
This original proto\TeX\ included macro definitions and elementary
manipulations on boxes and glue, but it did not have line-breaking,
page-breaking, mathematical formulas, alignment routines, error recovery,
or the present semantic nest; furthermore,
it used character lists instead of token lists, so that a control sequence
like \.{\\halign} was represented by a list of seven characters. A
complete version of \TeX\ was designed and coded by the author in late
1977 and early 1978; that program, like its prototype, was written in the
{\mc SAIL} language, for which an excellent debugging system was
available. Preliminary plans to convert the {\mc SAIL} code into a form
somewhat like the present ``web'' were developed by Luis Trabb~Pardo and
the author at the beginning of 1979, and a complete implementation was
created by Ignacio~A. Zabala in 1979 and 1980. The \TeX82 program, which
was written by the author during the latter part of 1981 and the early
part of 1982, also incorporates ideas from the 1979 implementation of
\TeX\ in {\mc MESA} that was written by Leonidas Guibas, Robert Sedgewick,
and Douglas Wyatt at the Xerox Palo Alto Research Center. Several hundred
refinements were introduced into \TeX82 based on the experiences gained with
the original implementations, so that essentially every part of the system
has been substantially improved. After the appearance of ``Version 0'' in
September, 1982, this program benefited greatly from the comments of
many other people, notably David~R. Fuchs and Howard~W. Trickey.
No doubt there still is plenty of room for improvement, but the author
is firmly committed to keeping \TeX82 ``frozen'' from now on; stability
and reliability are to be its main virtues.
On the other hand, the \.{WEB} description can be extended without changing
the core of \TeX82 itself, and the program has been designed so that such
extensions are not extremely difficult to make.
The \\{banner} string defined here should be changed whenever \TeX\
undergoes any modifications, so that it will be clear which version of
\TeX\ might be the guilty party when a problem arises.
If this program is changed, the resulting system should not be called
`\TeX'; the official name `\TeX' by itself is reserved
for software systems that are fully compatible with each other.
A special test suite called the ``\.{TRIP} test'' is available for
helping to determine whether a particular implementation deserves to be
known as `\TeX' [cf.~Stanford Computer Science report CS1027,
November 1984].
\Y\P\D \37$\\{banner}\S\.{\'This\ is\ TeX,\ Version\ 2.2\'}$\C{printed when %
\TeX\ starts}\par
\fi
\M3. Different \PASCAL s have slightly different conventions, and the present
program expresses \TeX\ in terms of the \PASCAL\ that was
available to the author in 1982. The methods used here to work with
this particular compiler, which we shall call \ph, should help the
reader to see how to make an appropriate interface for other systems
if necessary. (\ph\ is Charles Hedrick's modification of a compiler
for the DECsystem-10 that was originally developed at the University of
Hamburg; cf.\ {\sl SOFTWARE---Practice \AM\ Experience \bf6} (1976),
29--42. The \TeX\ program below is intended to be adaptable, without
extensive changes, to most other versions of \PASCAL, so it does not fully
use the admirable features of \ph. Indeed, a conscious effort has been
made here to avoid using several idiosyncratic features of standard
\PASCAL\ itself, so that most of the code can be translated mechanically
into other high-level languages. For example, the `\&{with}' and `\\{new}'
features are not used, nor are pointer types, set types, or enumerated
scalar types; there are no `\&{var}' parameters, except in the case of files;
there are no tag fields on variant records; there are no assignments
$\\{real}\K\\{integer}$; no procedures are declared local to other procedures.)
The portions of this program that involve system-dependent code, where
changes might be necessary because of differences between \PASCAL\ compilers
and/or differences between
operating systems, can be identified by looking at the sections whose
numbers are listed under `system dependencies' in the index. Furthermore,
the index entries for `dirty \PASCAL' list all places where the restrictions
of \PASCAL\ have not been followed perfectly, for one reason or another.
\fi
\M4. The program begins with a normal \PASCAL\ program heading, whose
components will be filled in later, using the conventions of \.{WEB}.
For example, the portion of the program called `\X\glob:Global
variables\X' here will be replaced by a sequence of variable declarations
that starts in $\section\glob$ of this documentation. In this way, we are able
to define each individual global variable when we are prepared to
understand what it means; we do not have to define all of the globals at
once. Cross references in $\section\glob$, where it says ``See also
sections \gglob, \dots,'' also make it possible to look at the set of
all global variables, if desired. Similar remarks apply to the other
portions of the program heading.
Actually the heading shown here is not quite normal: The \&{program}\ line
does not mention any \\{output} file, because \ph\ would ask the \TeX\ user
to specify a file name if \\{output} were specified here.
\Y\P\D \37$\\{mtype}\S\|t\J\|y\J\|p\J\|e$\C{this is a \.{WEB} coding trick:}\par
\P\F \37$\\{mtype}\S\\{type}$\C{`\&{mtype}' will be equivalent to `\&{type}'}%
\par
\P\F \37$\\{type}\S\\{true}$\C{but `\\{type}' will not be treated as a reserved
word}\par
\Y\P\hbox{\4}\X9\*:Compiler directives\X\6
\4\&{program}\1\ \37\\{TEX};\C{all file names are defined dynamically}\6
\4\&{label} \37\X6:Labels in the outer block\X\6
\4\&{const} \37\X11\*:Constants in the outer block\X\6
\4\&{mtype} \37\X18:Types in the outer block\X\6
\4\&{var} \37\X13:Global variables\X\7
\4\&{procedure}\1\ \37\\{initialize};\C{this procedure gets things started
properly}\6
\4\&{var} \37\X19:Local variables for initialization\X\2\6
\&{begin} \37\X8\*:Initialize whatever \TeX\ might access\X\6
\&{end};\7
\hbox{\4}\X57:Basic printing procedures\X\6
\hbox{\4}\X34\*:Error handling procedures\X\par
\fi
\M5. The overall \TeX\ program begins with the heading just shown, after which
comes a bunch of procedure declarations and function declarations.
Finally we will get to the main program, which begins with the
comment `\\{start\_here}'. If you want to skip down to the
main program now, you can look up `\\{start\_here}' in the index.
But the author suggests that the best way to understand this program
is to follow pretty much the order of \TeX's components as they appear in the
\.{WEB} description you are now reading, since the present ordering is
intended to combine the advantages of the ``bottom up'' and ``top down''
approaches to the problem of understanding a somewhat complicated system.
\fi
\M6. Three labels must be declared in the main program, so we give them
symbolic names.
\Y\P\D \37$\\{start\_of\_TEX}=1$\C{go here when \TeX's variables are
initialized}\par
\P\D \37$\\{end\_of\_TEX}=9998$\C{go here to close files and terminate
gracefully}\par
\P\D \37$\\{final\_end}=9999$\C{this label marks the ending of the program}\par
\Y\P$\4\X6:Labels in the outer block\X\S$\6
$\\{start\_of\_TEX}\hbox{\hskip-2pt},\39\\{end\_of\_TEX}\hbox{\hskip-2pt},\39\,%
\\{final\_end}$;\C{key control points}\par
\U section~4.\fi
\M7\*. Some of the code below is intended to be used only when diagnosing the
strange behavior that sometimes occurs when \TeX\ is being installed or
when system wizards are fooling around with \TeX\ without quite knowing
what they are doing. Such code will not normally be compiled; it is
delimited by the codewords `$ \&{debug} \ldots \&{gubed} $', with apologies
to people who wish to preserve the purity of English.
Similarly, there is some conditional code delimited by
`$ \&{stat} \ldots \&{tats} $' that is intended for use when statistics are to
be
kept about \TeX's memory usage. The \&{stat} $\ldots$ \&{tats} code also
implements diagnostic information for \\{tracing\_paragraphs} and
\\{tracing\_pages}.
\Y\P\D \37$\\{debug}\S$\C{change this to `$\\{debug}\equiv\null$' when
debugging}\par
\P\D \37$\\{gubed}\S$\C{change this to `$\\{gubed}\equiv\null$' when debugging}%
\par
\P\F \37$\\{debug}\S\\{begin}$\par
\P\F \37$\\{gubed}\S\\{end}$\Y\par
\P\D \37$\\{stat}\S$\C{change this to `$\\{stat}\equiv\.{@\{}$' when not
gathering usage statistics}\par
\P\D \37$\\{tats}\S$\C{change this to `$\\{tats}\equiv\.{@\}}$' when not
gathering usage statistics}\par
\P\F \37$\\{stat}\S\\{begin}$\par
\P\F \37$\\{tats}\S\\{end}$\par
\fi
\M8\*. This program has two important variations: (1) There is a long and slow
version called \.{INITEX}, which does the extra calculations needed to
initialize \TeX's internal tables; and (2)~there is a shorter and faster
production version, which cuts the initialization to a bare minimum.
Parts of the program that are needed in (1) but not in (2) are delimited by
the codewords `$ \&{init} \ldots \&{tini} $'.
\Y\P\D \37$\\{init}\S$\C{change this to `$\\{init}\equiv\.{@\{}$' in the
production version}\par
\P\D \37$\\{tini}\S$\C{change this to `$\\{tini}\equiv\.{@\}}$' in the
production version}\par
\P\F \37$\\{init}\S\\{begin}$\par
\P\F \37$\\{tini}\S\\{end}$\par
\Y\P$\4\X8\*:Initialize whatever \TeX\ might access\X\S$\6
\X21:Set initial values of key variables\X\6
\&{init} \37\X164:Initialize table entries (done by \.{INITEX} only)\X\ %
\&{tini}\par
\U section~4.\fi
\M9\*. If the first character of a \PASCAL\ comment is a dollar sign,
\ph\ treats the comment as a list of ``compiler directives'' that will
affect the translation of this program into machine language. The
directives shown below specify full checking and inclusion of the \PASCAL\
debugger when \TeX\ is being debugged, but they cause range checking and other
redundant code to be eliminated when the production system is being generated.
Arithmetic overflow will be detected in all cases.
\Y\P$\4\X9\*:Compiler directives\X\S$\6
$\B\J\$\|C-,\39\|A+,\39\|D-,\39\|W+\T$\C{no range check, catch arithmetic
overflow, no debug overhead}\6
\&{debug} \37$\B\J\$\|C+,\39\|D$: \37$5,\39\|W+,\39\|Z$: \37$\={377777777777B}%
\T$\ \&{gubed}\C{but turn everything on when debugging}\6
\C{the `$\|W+$' switch catches more syntax errors}\6
\C{the `\|Z' switch sets variables to maxint before they receive a value}\6
\C{the `\ignorespaces \|D: 5' avoids initial stop for the debugger}\par
\U section~4.\fi
\M10. This \TeX\ implementation conforms to the rules of the {\sl Pascal User
Manual} published by Jensen and Wirth in 1975, except where system-dependent
code is necessary to make a useful system program, and except in another
respect where such conformity would unnecessarily obscure the meaning
and clutter up the code: We assume that \&{case} statements may include a
default case that applies if no matching label is found. Thus, we shall use
constructions like
$$\vbox{\halign{\ignorespaces#\hfil\cr
\&{case} $\|x$ \&{of}\cr
1: $\langle\,$code for $x=1\,\rangle$;\cr
3: $\langle\,$code for $x=3\,\rangle$;\cr
\&{othercases} $\langle\,$code for $\|x\I1$ and $\|x\I3$$\,\rangle$\cr
\&{endcases} \cr}}$$
since most \PASCAL\ compilers have plugged this hole in the language by
incorporating some sort of default mechanism. For example, the \ph\
compiler allows `\\{others}:' as a default label, and other \PASCAL s allow
syntaxes like `\&{else}' or `\&{otherwise}' or `\\{otherwise}:', etc. The
definitions of \&{othercases} and \&{endcases} should be changed to agree
with
local conventions. Note that no semicolon appears before \&{endcases} in
this program, so the definition of \&{endcases} should include a semicolon
if the compiler wants one. (Of course, if no default mechanism is
available, the \&{case} statements of \TeX\ will have to be laboriously
extended by listing all remaining cases. People who are stuck with such
\PASCAL s have in fact done this, successfully but not happily!)
\Y\P\D \37$\\{othercases}\S\\{others}$: \37\C{default for cases not listed
explicitly}\par
\P\D \37$\\{endcases}\S$\ \&{end} \C{follows the default case in an extended %
\&{case} statement}\par
\P\F \37$\\{othercases}\S\\{else}$\par
\P\F \37$\\{endcases}\S\\{end}$\par
\fi
\M11\*. The following parameters can be changed at compile time to extend or
reduce \TeX's capacity. They may have different values in \.{INITEX} and
in production versions of \TeX.
\Y\P$\4\X11\*:Constants in the outer block\X\S$\6
$\\{mem\_max}=3000$;\C{greatest index in \TeX's internal \\{mem} array; must
be strictly less than \\{max\_halfword}; must be equal to \\{mem\_top} in %
\.{INITEX}, otherwise $\G\\{mem\_top}$}\6
$\\{mem\_min}=1$;\C{smallest index in \TeX's internal \\{mem} array; must be %
\\{min\_halfword} or more; must be equal to \\{mem\_bot} in \.{INITEX},
otherwise $\L\\{mem\_bot}$}\6
$\\{buf\_size}=500$;\C{maximum number of characters simultaneously present in
current lines of open files and in control sequences between \.{\\csname} and
\.{\\endcsname}; must not exceed \\{max\_halfword}}\6
$\\{error\_line}=64$;\C{width of context lines on terminal error messages}\6
$\\{half\_error\_line}=32$;\C{width of first lines of contexts in terminal
error messages; should be between 30 and $\\{error\_line}-15$}\6
$\\{max\_print\_line}=72$;\C{width of longest text lines output; should be at
least 60}\6
$\\{stack\_size}=200$;\C{maximum number of simultaneous input sources}\6
$\\{max\_in\_open}=6$;\C{maximum number of input files and error insertions
that can be going on simultaneously}\6
$\\{font\_max}=75$;\C{maximum internal font number; must not exceed \\{max%
\_quarterword}}\6
$\\{font\_mem\_size}=20000$;\C{number of words of \\{font\_info} for all fonts}%
\6
$\\{param\_size}=60$;\C{maximum number of simultaneous macro parameters}\6
$\\{nest\_size}=40$;\C{maximum number of semantic levels simultaneously active}%
\6
$\\{max\_strings}=3000$;\C{maximum number of strings}\6
$\\{string\_vacancies}=8000$;\C{the minimum number of characters that should be
available for the user's control sequences and font names, after \TeX's own
error messages are stored}\6
$\\{pool\_size}=32000$;\C{maximum number of characters in strings, including
all error messages and help texts, and the names of all fonts and control
sequences; must exceed \\{string\_vacancies} by the total length of \TeX's own
strings, which is currently about 22500}\6
$\\{save\_size}=600$;\C{space for saving values outside of current group; must
be at most \\{max\_halfword}}\6
$\\{trie\_size}=8000$;\C{space for hyphenation patterns; should be larger for %
\.{INITEX} than it is in production versions of \TeX}\6
$\\{dvi\_buf\_size}=800$;\C{size of the output buffer; must be a multiple of 8}%
\6
$\\{file\_name\_size}=23$;\C{file names shouldn't be longer than this}\6
$\\{pool\_name}=\.{\'TEX.POOL[TEX,DEK]\ \ \ \ \ \ \'}$;\C{string of length %
\\{file\_name\_size}; tells where the string pool appears}\par
\U section~4.\fi
\M12\*. Like the preceding parameters, the following quantities can be changed
at compile time to extend or reduce \TeX's capacity. But if they are changed,
it is necessary to rerun the initialization program \.{INITEX}
to generate new tables for the production \TeX\ program.
One can't simply make helter-skelter changes to the following constants,
since certain rather complex initialization
numbers are computed from them. They are defined here using
\.{WEB} macros, instead of being put into \PASCAL's \&{const} list, in order
to
emphasize this distinction.
\Y\P\D \37$\\{mem\_bot}=1$\C{smallest index in the \\{mem} array dumped by %
\.{INITEX}; must not be less than \\{mem\_min}}\par
\P\D \37$\\{mem\_top}\S3000$\C{largest index in the \\{mem} array dumped by %
\.{INITEX}; must be substantially larger than \\{mem\_bot} and not greater
than \\{mem\_max}}\par
\P\D \37$\\{font\_base}=0$\C{smallest internal font number; must not be less
than \\{min\_quarterword}}\par
\P\D \37$\\{hash\_size}=2100$\C{maximum number of control sequences; it should
be at most about $(\\{mem\_max}-\\{hi\_mem\_base})/6$, but 2100 is already
quite generous}\par
\P\D \37$\\{hash\_prime}=1777$\C{a prime number equal to about 85\% of \\{hash%
\_size}}\par
\P\D \37$\\{hyph\_size}=307$\C{another prime; the number of \.{\\hyphenation}
exceptions}\par
\fi
\M13. In case somebody has inadvertently made bad settings of the
``constants,''
\TeX\ checks them using a global variable called \\{bad}.
This is the first of many sections of \TeX\ where global variables are
defined.
\Y\P$\4\X13:Global variables\X\S$\6
\4\\{bad}: \37\\{integer};\C{is some ``constant'' wrong?}\par
\A sections~20, 26, 30\*, 39, 50, 54, 73, 76, 79, 96, 104, 115, 116\*, 117,
118, 124, 165, 173, 181, 213, 246, 253, 256, 271, 286, 297, 301, 304\*, 305,
308, 309, 310, 333, 361, 382, 387, 388, 410, 438, 447, 480, 489, 493\*, 512,
513\*, 520\*, 527, 532, 539, 549, 550, 555, 592, 595\*, 605, 616, 646, 647,
661, 684, 719, 724, 764, 770, 814, 821, 823, 825, 828, 833, 839, 847, 872, 892,
900, 905, 921, 926, 943, 945, 946, 950, 971, 980, 982, 989, 1074, 1266, 1281,
1299, 1305, 1331, 1342, 1345, 1377\*, and~1379\*.
\U section~4.\fi
\M14. Later on we will say `\ignorespaces \&{if} $\\{mem\_max}\G\\{max%
\_halfword}$ \&{then} $\\{bad}\K14$',
or something similar. (We can't do that until \\{max\_halfword} has been
defined.)
\Y\P$\4\X14:Check the ``constant'' values for consistency\X\S$\6
$\\{bad}\K0$;\6
\&{if} $(\\{half\_error\_line}<30)\V(\\{half\_error\_line}>\\{error\_line}-15)$
\1\&{then}\5
$\\{bad}\K1$;\2\6
\&{if} $\\{max\_print\_line}<60$ \1\&{then}\5
$\\{bad}\K2$;\2\6
\&{if} $\\{dvi\_buf\_size}\mathbin{\&{mod}}8\I0$ \1\&{then}\5
$\\{bad}\K3$;\2\6
\&{if} $\\{mem\_bot}+1100>\\{mem\_top}$ \1\&{then}\5
$\\{bad}\K4$;\2\6
\&{if} $\\{hash\_prime}>\\{hash\_size}$ \1\&{then}\5
$\\{bad}\K5$;\2\6
\&{if} $\\{max\_in\_open}\G128$ \1\&{then}\5
$\\{bad}\K6$;\2\par
\A sections~111, 290, 522, and~1249.
\U section~1332.\fi
\M15. Labels are given symbolic names by the following definitions, so that
occasional \&{goto} statements will be meaningful. We insert the label
`\\{exit}:' just before the `\ignorespaces \&{end} \unskip' of a procedure in
which we have used the `\&{return}' statement defined below; the label
`\\{restart}' is occasionally used at the very beginning of a procedure; and
the label `\\{reswitch}' is occasionally used just prior to a \&{case}
statement in which some cases change the conditions and we wish to branch
to the newly applicable case. Loops that are set up with the \~ \&{loop}
construction defined below are commonly exited by going to `\\{done}' or to
`\\{found}' or to `\\{not\_found}', and they are sometimes repeated by going to
`\\{continue}'. If two or more parts of a subroutine start differently but
end up the same, the shared code may be gathered together at
`\\{common\_ending}'.
Incidentally, this program never declares a label that isn't actually used,
because some fussy \PASCAL\ compilers will complain about redundant labels.
\Y\P\D \37$\\{exit}=10$\C{go here to leave a procedure}\par
\P\D \37$\\{restart}=20$\C{go here to start a procedure again}\par
\P\D \37$\\{reswitch}=21$\C{go here to start a case statement again}\par
\P\D \37$\\{continue}=22$\C{go here to resume a loop}\par
\P\D \37$\\{done}=30$\C{go here to exit a loop}\par
\P\D \37$\\{done1}=31$\C{like \\{done}, when there is more than one loop}\par
\P\D \37$\\{done2}=32$\C{for exiting the second loop in a long block}\par
\P\D \37$\\{done3}=33$\C{for exiting the third loop in a very long block}\par
\P\D \37$\\{done4}=34$\C{for exiting the fourth loop in an extremely long
block}\par
\P\D \37$\\{done5}=35$\C{for exiting the fifth loop in an immense block}\par
\P\D \37$\\{done6}=36$\C{for exiting the sixth loop in a block}\par
\P\D \37$\\{found}=40$\C{go here when you've found it}\par
\P\D \37$\\{found1}=41$\C{like \\{found}, when there's more than one per
routine}\par
\P\D \37$\\{found2}=42$\C{like \\{found}, when there's more than two per
routine}\par
\P\D \37$\\{not\_found}=45$\C{go here when you've found nothing}\par
\P\D \37$\\{common\_ending}=50$\C{go here when you want to merge with another
branch}\par
\fi
\M16. Here are some macros for common programming idioms.
\Y\P\D \37$\\{incr}(\#)\S\#\K\#+1$\C{increase a variable by unity}\par
\P\D \37$\\{decr}(\#)\S\#\K\#-1$\C{decrease a variable by unity}\par
\P\D \37$\\{negate}(\#)\S\#\K-\#$\C{change the sign of a variable}\par
\P\D \37$\\{loop}\S$\ \&{while} $\\{true}$ \1\&{do}\ \C{repeat over and over
until a \&{goto} happens}\par
\P\F \37$\\{loop}\S\\{xclause}$\C{\.{WEB}'s \~ \&{xclause} acts like `%
\ignorespaces \&{while} $\\{true}$ \&{do}\unskip'}\par
\P\D \37$\\{do\_nothing}\S$\C{empty statement}\par
\P\D \37$\\{return}\S$\1\5
\&{goto} \37\\{exit}\C{terminate a procedure call}\2\par
\P\F \37$\\{return}\S\\{nil}$\par
\P\D \37$\\{empty}=0$\C{symbolic name for a null constant}\par
\fi
\N17. \[2] The character set.
In order to make \TeX\ readily portable between a wide variety of
computers, all of its input text is converted to an internal seven-bit
code that is essentially standard ASCII, the ``American Standard Code for
Information Interchange.'' This conversion is done immediately when each
character is read in. Conversely, characters are converted from ASCII to
the user's external representation just before they are output to a
text file.
Such an internal code is relevant to users of \TeX\ primarily because it
governs the positions of characters in the fonts. For example, the
character `\.A' has ASCII code $65=\O{101}$, and when \TeX\ typesets
this letter it specifies character number 65 in the current font.
If that font actually has `\.A' in a different position, \TeX\ doesn't
know what the real position is; the program that does the actual printing from
\TeX's device-independent files is responsible for converting from ASCII to
a particular font encoding.
\TeX's internal code is relevant also with respect to constants
that begin with a reverse apostrophe; and it provides an index to the
\.{\\catcode}, \.{\\mathcode}, \.{\\uccode}, \.{\\lccode}, and \.{\\delcode}
tables.
\fi
\M18. Characters of text that have been converted to \TeX's internal form
are said to be of type \\{ASCII\_code}, which is a subrange of the integers.
\Y\P$\4\X18:Types in the outer block\X\S$\6
$\\{ASCII\_code}=0\to127$;\C{seven-bit numbers}\par
\A sections~25, 38, 101, 109, 113, 150, 212, 269, 300, 548, 594\*, 920, and~925.
\U section~4.\fi
\M19. The original \PASCAL\ compiler was designed in the late 60s, when six-bit
character sets were common, so it did not make provision for lowercase
letters. Nowadays, of course, we need to deal with both capital and small
letters in a convenient way, especially in a program for typesetting;
so the present specification of \TeX\ has been written under the assumption
that the \PASCAL\ compiler and run-time system permit the use of text files
with more than 64 distinguishable characters. More precisely, we assume that
the character set contains at least the letters and symbols associated
with ASCII codes \O{40} through \O{176}; all of these characters are now
available on most computer terminals.
Since we are dealing with more characters than were present in the first
\PASCAL\ compilers, we have to decide what to call the associated data
type. Some \PASCAL s use the original name \\{char} for the
characters in text files, even though there now are more than 64 such
characters, while other \PASCAL s consider \\{char} to be a 64-element
subrange of a larger data type that has some other name.
In order to accommodate this difference, we shall use the name \\{text\_char}
to stand for the data type of the characters that are converted to and
from \\{ASCII\_code} when they are input and output. We shall also assume
that \\{text\_char} consists of the elements $\\{chr}(\\{first\_text\_char})$
through
$\\{chr}(\\{last\_text\_char})$, inclusive. The following definitions should be
adjusted if necessary.
\Y\P\D \37$\\{text\_char}\S\\{char}$\C{the data type of characters in text
files}\par
\P\D \37$\\{first\_text\_char}=0$\C{ordinal number of the smallest element of %
\\{text\_char}}\par
\P\D \37$\\{last\_text\_char}=127$\C{ordinal number of the largest element of %
\\{text\_char}}\par
\Y\P$\4\X19:Local variables for initialization\X\S$\6
\4\|i: \37$0\to\\{last\_text\_char}$;\par
\A sections~163 and~927.
\U section~4.\fi
\M20. The \TeX\ processor converts between ASCII code and
the user's external character set by means of arrays \\{xord} and \\{xchr}
that are analogous to \PASCAL's \\{ord} and \\{chr} functions.
\Y\P$\4\X13:Global variables\X\mathrel{+}\S$\6
\4\\{xord}: \37\&{array} $[\\{text\_char}]$ \1\&{of}\5
\\{ASCII\_code};\C{specifies conversion of input characters}\2\6
\4\\{xchr}: \37\&{array} $[\\{ASCII\_code}]$ \1\&{of}\5
\\{text\_char};\C{specifies conversion of output characters}\2\par
\fi
\M21. Since we are assuming that our \PASCAL\ system is able to read and write
the
visible characters of standard ASCII (although not necessarily using the
ASCII codes to represent them), the following assignment statements initialize
most of the \\{xchr} array properly, without needing any system-dependent
changes. On the other hand, it is possible to implement \TeX\ with
less complete character sets, and in such cases it will be necessary to
change something here.
\Y\P$\4\X21:Set initial values of key variables\X\S$\6
$\\{xchr}[\O{40}]\K\.{\'\ \'}$;\5
$\\{xchr}[\O{41}]\K\.{\'!\'}$;\5
$\\{xchr}[\O{42}]\K\.{\'"\'}$;\5
$\\{xchr}[\O{43}]\K\.{\'\#\'}$;\5
$\\{xchr}[\O{44}]\K\.{\'\$\'}$;\5
$\\{xchr}[\O{45}]\K\.{\'\%\'}$;\5
$\\{xchr}[\O{46}]\K\.{\'\&\'}$;\5
$\\{xchr}[\O{47}]\K\.{\'\'}\.{\'\'}$;\6
$\\{xchr}[\O{50}]\K\.{\'(\'}$;\5
$\\{xchr}[\O{51}]\K\.{\')\'}$;\5
$\\{xchr}[\O{52}]\K\.{\'*\'}$;\5
$\\{xchr}[\O{53}]\K\.{\'+\'}$;\5
$\\{xchr}[\O{54}]\K\.{\',\'}$;\5
$\\{xchr}[\O{55}]\K\.{\'-\'}$;\5
$\\{xchr}[\O{56}]\K\.{\'.\'}$;\5
$\\{xchr}[\O{57}]\K\.{\'/\'}$;\6
$\\{xchr}[\O{60}]\K\.{\'0\'}$;\5
$\\{xchr}[\O{61}]\K\.{\'1\'}$;\5
$\\{xchr}[\O{62}]\K\.{\'2\'}$;\5
$\\{xchr}[\O{63}]\K\.{\'3\'}$;\5
$\\{xchr}[\O{64}]\K\.{\'4\'}$;\5
$\\{xchr}[\O{65}]\K\.{\'5\'}$;\5
$\\{xchr}[\O{66}]\K\.{\'6\'}$;\5
$\\{xchr}[\O{67}]\K\.{\'7\'}$;\6
$\\{xchr}[\O{70}]\K\.{\'8\'}$;\5
$\\{xchr}[\O{71}]\K\.{\'9\'}$;\5
$\\{xchr}[\O{72}]\K\.{\':\'}$;\5
$\\{xchr}[\O{73}]\K\.{\';\'}$;\5
$\\{xchr}[\O{74}]\K\.{\'<\'}$;\5
$\\{xchr}[\O{75}]\K\.{\'=\'}$;\5
$\\{xchr}[\O{76}]\K\.{\'>\'}$;\5
$\\{xchr}[\O{77}]\K\.{\'?\'}$;\6
$\\{xchr}[\O{100}]\K\.{\'@\'}$;\5
$\\{xchr}[\O{101}]\K\.{\'A\'}$;\5
$\\{xchr}[\O{102}]\K\.{\'B\'}$;\5
$\\{xchr}[\O{103}]\K\.{\'C\'}$;\5
$\\{xchr}[\O{104}]\K\.{\'D\'}$;\5
$\\{xchr}[\O{105}]\K\.{\'E\'}$;\5
$\\{xchr}[\O{106}]\K\.{\'F\'}$;\5
$\\{xchr}[\O{107}]\K\.{\'G\'}$;\6
$\\{xchr}[\O{110}]\K\.{\'H\'}$;\5
$\\{xchr}[\O{111}]\K\.{\'I\'}$;\5
$\\{xchr}[\O{112}]\K\.{\'J\'}$;\5
$\\{xchr}[\O{113}]\K\.{\'K\'}$;\5
$\\{xchr}[\O{114}]\K\.{\'L\'}$;\5
$\\{xchr}[\O{115}]\K\.{\'M\'}$;\5
$\\{xchr}[\O{116}]\K\.{\'N\'}$;\5
$\\{xchr}[\O{117}]\K\.{\'O\'}$;\6
$\\{xchr}[\O{120}]\K\.{\'P\'}$;\5
$\\{xchr}[\O{121}]\K\.{\'Q\'}$;\5
$\\{xchr}[\O{122}]\K\.{\'R\'}$;\5
$\\{xchr}[\O{123}]\K\.{\'S\'}$;\5
$\\{xchr}[\O{124}]\K\.{\'T\'}$;\5
$\\{xchr}[\O{125}]\K\.{\'U\'}$;\5
$\\{xchr}[\O{126}]\K\.{\'V\'}$;\5
$\\{xchr}[\O{127}]\K\.{\'W\'}$;\6
$\\{xchr}[\O{130}]\K\.{\'X\'}$;\5
$\\{xchr}[\O{131}]\K\.{\'Y\'}$;\5
$\\{xchr}[\O{132}]\K\.{\'Z\'}$;\5
$\\{xchr}[\O{133}]\K\.{\'[\'}$;\5
$\\{xchr}[\O{134}]\K\.{\'\\\'}$;\5
$\\{xchr}[\O{135}]\K\.{\']\'}$;\5
$\\{xchr}[\O{136}]\K\.{\'\↑\'}$;\5
$\\{xchr}[\O{137}]\K\.{\'\_\'}$;\6
$\\{xchr}[\O{140}]\K\.{\'\`\'}$;\5
$\\{xchr}[\O{141}]\K\.{\'a\'}$;\5
$\\{xchr}[\O{142}]\K\.{\'b\'}$;\5
$\\{xchr}[\O{143}]\K\.{\'c\'}$;\5
$\\{xchr}[\O{144}]\K\.{\'d\'}$;\5
$\\{xchr}[\O{145}]\K\.{\'e\'}$;\5
$\\{xchr}[\O{146}]\K\.{\'f\'}$;\5
$\\{xchr}[\O{147}]\K\.{\'g\'}$;\6
$\\{xchr}[\O{150}]\K\.{\'h\'}$;\5
$\\{xchr}[\O{151}]\K\.{\'i\'}$;\5
$\\{xchr}[\O{152}]\K\.{\'j\'}$;\5
$\\{xchr}[\O{153}]\K\.{\'k\'}$;\5
$\\{xchr}[\O{154}]\K\.{\'l\'}$;\5
$\\{xchr}[\O{155}]\K\.{\'m\'}$;\5
$\\{xchr}[\O{156}]\K\.{\'n\'}$;\5
$\\{xchr}[\O{157}]\K\.{\'o\'}$;\6
$\\{xchr}[\O{160}]\K\.{\'p\'}$;\5
$\\{xchr}[\O{161}]\K\.{\'q\'}$;\5
$\\{xchr}[\O{162}]\K\.{\'r\'}$;\5
$\\{xchr}[\O{163}]\K\.{\'s\'}$;\5
$\\{xchr}[\O{164}]\K\.{\'t\'}$;\5
$\\{xchr}[\O{165}]\K\.{\'u\'}$;\5
$\\{xchr}[\O{166}]\K\.{\'v\'}$;\5
$\\{xchr}[\O{167}]\K\.{\'w\'}$;\6
$\\{xchr}[\O{170}]\K\.{\'x\'}$;\5
$\\{xchr}[\O{171}]\K\.{\'y\'}$;\5
$\\{xchr}[\O{172}]\K\.{\'z\'}$;\5
$\\{xchr}[\O{173}]\K\.{\'\{\'}$;\5
$\\{xchr}[\O{174}]\K\.{\'|\'}$;\5
$\\{xchr}[\O{175}]\K\.{\'\}\'}$;\5
$\\{xchr}[\O{176}]\K\.{\'\~\'}$;\6
$\\{xchr}[0]\K\.{\'\ \'}$;\5
$\\{xchr}[\O{177}]\K\.{\'\ \'}$;\C{ASCII codes 0 and \O{177} do not appear in
text}\par
\A sections~23\*, 24, 74, 77, 80, 97, 166, 215, 254, 257, 272, 287, 383, 439,
481, 490, 521\*, 551, 556, 593, 596, 606, 648, 662, 685, 771, 928, 990, 1267,
1282, 1300, 1343, 1378\*, and~1380\*.
\U section~8\*.\fi
\M22. Some of the ASCII codes without visible characters have been given
symbolic
names in this program because they are used with a special meaning.
\Y\P\D \37$\\{null\_code}=\O{0}$\C{ASCII code that might disappear}\par
\P\D \37$\\{carriage\_return}=\O{15}$\C{ASCII code used at end of line}\par
\P\D \37$\\{invalid\_code}=\O{177}$\C{ASCII code that should not appear}\par
\fi
\M23\*. The ASCII code is ``standard'' only to a certain extent, since many
computer installations have found it advantageous to have ready access
to more than 94 printing characters. Appendix~C of {\sl The \TeX book\/}
gives a complete specification of the intended correspondence between
characters and \TeX's internal representation.
If \TeX\ is being used
on a garden-variety \PASCAL\ for which only standard ASCII
codes will appear in the input and output files, it doesn't really matter
what codes are specified in $\\{xchr}[1\to\O{37}]$, but the safest policy is to
blank everything out by using the code shown below.
However, other settings of \\{xchr} will make \TeX\ more friendly on
computers that have an extended character set, so that users can type things
like `\.↑↑Z' instead of `\.{\\ne}'. At MIT, for example, it would be more
appropriate to substitute the code
$$\hbox{ \&{for} $\|i\K1\mathrel{\&{to}}\O{37}$ \&{do} $\\{xchr}[\|i]\K\\{chr}(%
\|i)$;}$$
\TeX's character set is essentially the same as MIT's, even with respect to
characters less than~\O{40}. People with extended character sets can
assign codes arbitrarily, giving an \\{xchr} equivalent to whatever
characters the users of \TeX\ are allowed to have in their input files.
It is best to make the codes correspond to the intended interpretations as
shown in Appendix~C whenever possible; but this is not necessary. For
example, in countries with an alphabet of more than 26 letters, it is
usually best to map the additional letters into codes less than~\O{40}.
The code shown here is intended to be used on the Stanford {\sc SAIL} system,
and at other installations like CMU and ISI where essentially the same
extended character set is used. The fact that {\mc SAIL} has \.{\'\}\'} in the
wrong place turns out to cause no difficulty in this case.
\Y\P$\4\X21:Set initial values of key variables\X\mathrel{+}\S$\6
\&{for} $\|i\K1\mathrel{\&{to}}\O{37}$ \1\&{do}\5
$\\{xchr}[\|i]\K\\{chr}(\|i)$;\2\6
$\\{xchr}[\O{30}]\K\\{chr}(\O{137})$;\5
$\\{xchr}[\O{32}]\K\\{chr}(\O{33})$;\C{\\{not\_equal} sign}\6
$\\{xchr}[\O{33}]\K\\{chr}(\O{176})$;\par
\fi
\M24. The following system-independent code makes the \\{xord} array contain a
suitable inverse to the information in \\{xchr}. Note that if $\\{xchr}[\|i]=%
\\{xchr}[\|j]$
where $\|i<\|j<\O{177}$, the value of $\\{xord}[\\{xchr}[\|i]]$ will turn out
to be
\|j or more; hence, standard ASCII code numbers will be used instead of
codes below \O{40} in case there is a coincidence.
\Y\P$\4\X21:Set initial values of key variables\X\mathrel{+}\S$\6
\&{for} $\|i\K\\{first\_text\_char}\mathrel{\&{to}}\\{last\_text\_char}$ \1%
\&{do}\5
$\\{xord}[\\{chr}(\|i)]\K\\{invalid\_code}$;\2\6
\&{for} $\|i\K1\mathrel{\&{to}}\O{176}$ \1\&{do}\5
$\\{xord}[\\{xchr}[\|i]]\K\|i$;\2\par
\fi
\N25. \[3] Input and output.
The bane of portability is the fact that different operating systems treat
input and output quite differently, perhaps because computer scientists
have not given sufficient attention to this problem. People have felt somehow
that input and output are not a part of ``real'' programming. Well, it is true
that some kinds of programming are more fun than others. With existing
input/output conventions being so diverse and so messy, the only sources of
joy in such parts of the code are the rare occasions when one can find a
way to make the program a little less bad than it might have been. We have
two choices: either to attack I/O now and get it over with, or to postpone
it until near the end. Neither prospect is very attractive, so let's
get it over with.
The basic operations we need to do are (1)~inputting and outputting of
text, to or from a file or the user's terminal; (2)~inputting and
outputting of eight-bit bytes, to or from a file; (3)~instructing the
operating system to initiate (``open'') or to terminate (``close'') input or
output from a specified file; (4)~testing whether the end of an input
file has been reached.
\TeX\ needs to deal with two kinds of files.
We shall use the term \\{alpha\_file} for a file that contains textual data,
and the term \\{byte\_file} for a file that contains eight-bit binary
information.
These two types turn out to be the same on many computers, but
sometimes there is a significant distinction, so we shall be careful to
distinguish between them. Standard protocols for transferring
such files from computer to computer, via high-speed networks, are
now becoming available to more and more communities of users.
The program actually makes use also of a third kind of file, called a
\\{word\_file}, when dumping and reloading base information for its own
initialization. We shall define a word file later; but it will be possible
for us to specify simple operations on word files before they are defined.
\Y\P$\4\X18:Types in the outer block\X\mathrel{+}\S$\6
$\\{eight\_bits}=0\to255$;\C{unsigned one-byte quantity}\6
$\\{alpha\_file}=$\1\5
\&{packed} \37\&{file} \1\&{of}\5
\\{text\_char};\C{files that contain textual data}\2\2\6
$\\{byte\_file}=$\1\5
\&{packed} \37\&{file} \1\&{of}\5
\\{eight\_bits};\C{files that contain binary data}\2\2\par
\fi
\M26. Most of what we need to do with respect to input and output can be
handled
by the I/O facilities that are standard in \PASCAL, i.e., the routines
called \\{get}, \\{put}, \\{eof}, and so on. But
standard \PASCAL\ does not allow file variables to be associated with file
names that are determined at run time, so it cannot be used to implement
\TeX; some sort of extension to \PASCAL's ordinary \\{reset} and \\{rewrite}
is crucial for our purposes. We shall assume that \\{name\_of\_file} is a
variable
of an appropriate type such that the \PASCAL\ run-time system being used to
implement \TeX\ can open a file whose external name is specified by
\\{name\_of\_file}.
\Y\P$\4\X13:Global variables\X\mathrel{+}\S$\6
\4\\{name\_of\_file}: \37\&{packed} \37\&{array} $[1\to\\{file\_name\_size}]$ %
\1\&{of}\5
\\{char};\2\6
\C{on some systems this may be a \&{record} variable}\6
\4\\{name\_length}: \37$0\to\\{file\_name\_size}$;\6
\C{this many characters are actually relevant in \\{name\_of\_file} (the rest
are blank)}\par
\fi
\M27\*. The \ph\ compiler with which the present version of \TeX\ was prepared
has
extended the rules of \PASCAL\ in a very convenient way. To open file~\|f,
we can write
$$\vbox{\halign{#\hfil\qquad\hfil\cr
$\\{reset}(\|f,\hbox{\\{name}},\.{\'/O\'})$&for input;\cr
$\\{rewrite}(\|f,\hbox{\\{name}},\.{\'/O\'})$&for output.\cr}}$$
The `\\{name}' parameter, which is of type `{\bf packed array
$[\langle\\{any}\rangle]$ of \\{char}}', stands for the name of
the external file that is being opened for input or output.
Blank spaces that might appear in \\{name} are ignored.
The `\.{/O}' parameter tells the operating system not to issue its own
error messages if something goes wrong. If a file of the specified name
cannot be found, or if such a file cannot be opened for some other reason
(e.g., someone may already be trying to write the same file), we will have
$\\{erstat}(\|f)\I0$ after an unsuccessful \\{reset} or \\{rewrite}. This
allows
\TeX\ to undertake appropriate corrective action.
\TeX's file-opening procedures return \\{false} if no file identified by
\\{name\_of\_file} could be opened.
\Y\P\D \37$\\{reset\_OK}(\#)\S\\{erstat}(\#)\mathbin{\&{mod}}\O{20000}=0$\par
\P\D \37$\\{rewrite\_OK}(\#)\S\\{erstat}(\#)\mathbin{\&{mod}}\O{20000}=0$\par
\Y\P\4\&{function} \1\ \\{erstat} ( $\mathop{\&{var}}\|f:$ \&{file} ) : %
\\{integer};\5
\\{extern};\5
\hbox{/2}\7
\4\&{function}\1\ \37$\\{a\_open\_in}(\mathop{\&{var}}\|f:\\{alpha\_file})$: %
\37\\{boolean};\C{open a text file for input}\2\6
\&{begin} \37$\\{reset}(\|f,\39\\{name\_of\_file},\39\.{\'/E/O/N:9\'})$;\C{the %
\.{/E} switch distinguishes \\{form\_feed} from \\{carriage\_return}; the %
\.{/O} switch gives error control to us; and the \.{/N:9} switch specifies 9
buffers, which seems to work satisfactorily at {\mc SAIL}}\6
$\\{a\_open\_in}\K\\{reset\_OK}(\|f)$;\6
\&{end};\7
\4\&{function}\1\ \37$\\{a\_open\_out}(\mathop{\&{var}}\|f:\\{alpha\_file})$: %
\37\\{boolean};\C{open a text file for output}\2\6
\&{begin} \37$\\{rewrite}(\|f,\39\\{name\_of\_file},\39\.{\'/O/N:2\'})$;\5
$\\{a\_open\_out}\K\\{rewrite\_OK}(\|f)$;\6
\&{end};\C{two buffers seems adequate for text output files}\7
\4\&{function}\1\ \37$\\{b\_open\_in}(\mathop{\&{var}}\|f:\\{byte\_file})$: %
\37\\{boolean};\C{open a binary file for input}\2\6
\&{begin} \37$\\{reset}(\|f,\39\\{name\_of\_file},\39\.{\'/B:8/O/N:2\'})$;\5
$\\{b\_open\_in}\K\\{reset\_OK}(\|f)$;\6
\&{end};\C{the \.{/B} switch is necessary to get byte packing}\7
\4\&{function}\1\ \37$\\{b\_open\_out}(\mathop{\&{var}}\|f:\\{byte\_file})$: %
\37\\{boolean};\C{open a binary file for output}\2\6
\&{begin} \37$\\{rewrite}(\|f,\39\\{name\_of\_file},\39\.{\'/O/N:9/P:256\'})$;\5
$\\{b\_open\_out}\K\\{rewrite\_OK}(\|f)$;\6
\&{end};\C{here we use \\{ary\_out} so the \.{/B} switch isn't appropriate}\6
\C{the \.{/P:256} sets file protection to `dump never'}\7
\4\&{function}\1\ \37$\\{w\_open\_in}(\mathop{\&{var}}\|f:\\{word\_file})$: %
\37\\{boolean};\C{open a word file for input}\2\6
\&{begin} \37$\\{reset}(\|f,\39\\{name\_of\_file},\39\.{\'/O/N:9\'})$;\5
$\\{w\_open\_in}\K\\{reset\_OK}(\|f)$;\6
\&{end};\7
\4\&{function}\1\ \37$\\{w\_open\_out}(\mathop{\&{var}}\|f:\\{word\_file})$: %
\37\\{boolean};\C{open a word file for output}\2\6
\&{begin} \37$\\{rewrite}(\|f,\39\\{name\_of\_file},\39\.{\'/O/N:9\'})$;\5
$\\{w\_open\_out}\K\\{rewrite\_OK}(\|f)$;\6
\&{end};\par
\fi
\M28. Files can be closed with the \ph\ routine `$\\{close}(\|f)$', which
should be used when all input or output with respect to \|f has been completed.
This makes \|f available to be opened again, if desired; and if \|f was used
for
output, the \\{close} operation makes the corresponding external file appear
on the user's area, ready to be read.
These procedures should not generate error messages if a file is
being closed before it has been successfully opened.
\Y\P\4\&{procedure}\1\ \37$\\{a\_close}(\mathop{\&{var}}\|f:\\{alpha\_file})$;%
\C{close a text file}\2\6
\&{begin} \37$\\{close}(\|f)$;\6
\&{end};\7
\4\&{procedure}\1\ \37$\\{b\_close}(\mathop{\&{var}}\|f:\\{byte\_file})$;%
\C{close a binary file}\2\6
\&{begin} \37$\\{close}(\|f)$;\6
\&{end};\7
\4\&{procedure}\1\ \37$\\{w\_close}(\mathop{\&{var}}\|f:\\{word\_file})$;%
\C{close a word file}\2\6
\&{begin} \37$\\{close}(\|f)$;\6
\&{end};\par
\fi
\M29. Binary input and output are done with \PASCAL's ordinary \\{get} and %
\\{put}
procedures, so we don't have to make any other special arrangements for
binary~I/O. Text output is also easy to do with standard \PASCAL\ routines.
The treatment of text input is more difficult, however, because
of the necessary translation to \\{ASCII\_code} values, and because
\TeX's conventions should be efficient and they should
blend nicely with the user's operating environment.
\fi
\M30\*. Input from text files is read one line at a time, using a routine
called
\\{input\_ln}. This function is defined in terms of global variables
called \\{buffer}, \\{first}, and \\{last} that will be described in detail
later; for now, it suffices for us to know that \\{buffer} is an array of
\\{ASCII\_code} values, and that \\{first} and \\{last} are indices into this
array representing the beginning and ending of a line of text.
We will read the lines first into an auxiliary buffer, in order to
save the running time of procedure-call overhead. This uses a nice
feature of \ph\ that Knuth chose not to mention in \TeX82.
At {\mc SAIL} we want to recognize page marks (indicated by \\{form\_feed}
characters), and keep track of the current page number.
\Y\P\D \37$\\{form\_feed}=\O{14}$\C{ASCII code used at end of a page}\par
\Y\P$\4\X13:Global variables\X\mathrel{+}\S$\6
\4\\{buffer}: \37\&{array} $[0\to\\{buf\_size}]$ \1\&{of}\5
\\{ASCII\_code};\C{lines of characters being read}\2\6
\4\\{first}: \37$0\to\\{buf\_size}$;\C{the first unused position in \\{buffer}}%
\6
\4\\{last}: \37$0\to\\{buf\_size}$;\C{end of the line just input to \\{buffer}}%
\6
\4\\{max\_buf\_stack}: \37$0\to\\{buf\_size}$;\C{largest index used in %
\\{buffer}}\6
\4\\{aux\_buf}: \37\&{array} $[0\to70]$ \1\&{of}\5
\\{text\_char};\C{where the characters go first}\2\par
\fi
\M31\*. The \\{input\_ln} function brings the next line of input from the
specified
file into available positions of the buffer array and returns the value
\\{true}, unless the file has already been entirely read, in which case it
returns \\{false} and sets $\\{last}\K\\{first}$. In general, the \\{ASCII%
\_code}
numbers that represent the next line of the file are input into
$\\{buffer}[\\{first}]$, $\\{buffer}[\\{first}+1]$, \dots, $\\{buffer}[%
\\{last}-1]$; and the
global variable \\{last} is set equal to \\{first} plus the length of the
line. Trailing blanks are removed from the line; thus, either $\\{last}=%
\\{first}$
(in which case the line was entirely blank) or $\\{buffer}[\\{last}-1]\I\.{"\
"}$.
An overflow error is given, however, if the normal actions of \\{input\_ln}
would make $\\{last}\G\\{buf\_size}$; this is done so that other parts of \TeX\
can safely look at the contents of $\\{buffer}[\\{last}+1]$ without
overstepping
the bounds of the \\{buffer} array. Upon entry to \\{input\_ln}, the condition
$\\{first}<\\{buf\_size}$ will always hold, so that there is always room for an
``empty'' line.
The variable \\{max\_buf\_stack}, which is used to keep track of how large
the \\{buf\_size} parameter must be to accommodate the present job, is
also kept up to date by \\{input\_ln}.
If the \\{bypass\_eoln} parameter is \\{true}, \\{input\_ln} will do a \\{get}
before looking at the first character of the line; this skips over
an \\{eoln} that was in $\|f\↑$. The procedure does not do a \\{get} when it
reaches the end of the line; therefore it can be used to acquire input
from the user's terminal as well as from ordinary text files.
Standard \PASCAL\ says that a file should have \\{eoln} immediately
before \\{eof}, but \TeX\ needs only a weaker restriction: If \\{eof}
occurs in the middle of a line, the system function \\{eoln} should return
a \\{true} result (even though $\|f\↑$ will be undefined).
Since the inner loop of \\{input\_ln} is part of \TeX's ``inner loop''---each
character of input comes in at this place---it is wise to reduce system
overhead by making use of special routines that read in an entire array
of characters at once, if such routines are available. The following
code uses standard \PASCAL\ to illustrate what needs to be done, but
finer tuning is often possible at well-developed \PASCAL\ sites.
\Y\P\4\&{function}\1\ \37$\\{input\_ln}(\mathop{\&{var}}\|f:\\{alpha\_file};\,%
\35\\{bypass\_eoln}:\\{boolean})$: \37\\{boolean};\C{inputs the next line or
returns \\{false}}\6
\4\&{var} \37\\{last\_nonblank}: \37$0\to\\{buf\_size}$;\C{\\{last} with
trailing blanks removed}\2\6
\&{begin} \37\&{if} $\\{bypass\_eoln}$ \1\&{then}\6
\&{if} $\R\\{eof}(\|f)$ \1\&{then}\5
$\\{get}(\|f)$;\C{input the first character of the line into $\|f\↑$}\2\2\6
$\\{last}\K\\{first}$;\C{cf.\ Matthew 19\thinspace:\thinspace30}\6
\&{if} $\\{eof}(\|f)$ \1\&{then}\5
$\\{input\_ln}\K\\{false}$\6
\4\&{else} \&{begin} \37$\\{last\_nonblank}\K\\{first}$;\6
\&{while} $\R\\{eoln}(\|f)$ \1\&{do}\6
\&{begin} \37\&{if} $\\{last}\G\\{max\_buf\_stack}$ \1\&{then}\6
\&{begin} \37$\\{max\_buf\_stack}\K\\{last}+1$;\6
\&{if} $\\{max\_buf\_stack}=\\{buf\_size}$ \1\&{then}\5
$\\{overflow}(\.{"buffer\ size"},\39\\{buf\_size})$;\2\6
\&{end};\2\6
$\\{write\_ln}(\\{tty},\39\|f\↑,\39\\{ord}(\|f\↑):1)$;\6
\&{if} $\\{ord}(\|f\↑)=\O{33}$ \1\&{then}\6
\&{begin} \37$\\{get}(\|f)$;\5
$\\{write\_ln}(\\{tty},\39\.{"then\ "},\39\|f\↑,\39\\{ord}(\|f\↑):1)$;\6
\&{if} $(\\{ord}(\|f\↑)\G\.{"@"})\W(\\{ord}(\|f\↑)\L\.{"\_"})$ \1\&{then}\6
\&{begin} \37$\\{buffer}[\\{last}]\K\\{xord}[\\{chr}(\\{ord}(\|f\↑)-\O{100})]$;%
\5
$\\{get}(\|f)$;\6
\&{end}\6
\4\&{else} $\\{buffer}[\\{last}]\K\\{invalid\_code}$;\2\6
\&{end}\6
\4\&{else} \&{begin} \37$\\{buffer}[\\{last}]\K\\{xord}[\|f\↑]$;\5
$\\{get}(\|f)$;\6
\&{end};\2\6
$\\{incr}(\\{last})$;\6
\&{if} $\\{buffer}[\\{last}-1]\I\.{"\ "}$ \1\&{then}\5
$\\{last\_nonblank}\K\\{last}$;\2\6
\&{end};\2\6
$\\{last}\K\\{last\_nonblank}$;\5
$\\{input\_ln}\K\\{true}$;\6
\&{end};\2\6
\&{end};\par
\fi
\M32\*. The user's terminal acts essentially like other files of text, except
that it is used both for input and for output. When the terminal is
considered an input file, the file variable is called \\{term\_in}, and when it
is considered an output file the file variable is \\{term\_out}.
On WAITS, this point is moot, since we use the built-in \\{TTY} file.
\Y\P\D \37$\\{term\_in}\S\\{TTY}$\C{the terminal as an input file}\par
\P\D \37$\\{term\_out}\S\\{TTY}$\C{the terminal as an output file}\par
\fi
\M33\*. Here is how to open the terminal files on WAITS: we don't do anything,
since \\{TTY} is always open. Note that $\\{eoln}(\\{term\_in})$ is initially %
\\{true}.
\Y\P\D \37$\\{t\_open\_in}\S\\{do\_nothing}$\C{open the terminal for text
input}\par
\P\D \37$\\{t\_open\_out}\S\\{do\_nothing}$\C{open the terminal for text
output}\par
\fi
\M34\*. Sometimes it is necessary to synchronize the input/output mixture that
happens on the user's terminal, and three procedures are used for this
purpose. The first of these, \\{update\_terminal}, is called when we want
to make sure that everything we have output to the terminal so far has
actually left the computer's internal buffers and been sent.
The second, \\{clear\_terminal}, is called when we wish to cancel any
input that the user may have typed ahead (since we are about to
issue an unexpected error message). The third, \\{wake\_up\_terminal},
is supposed to revive the terminal if the user has disabled it by
some instruction to the operating system. The following macros show how
these operations can be specified in \ph:
\Y\P\D \37$\\{update\_terminal}\S\\{break}(\\{term\_out})$\C{empty the terminal
output buffer}\par
\P\D \37$\\{clear\_terminal}\S\\{break\_in}(\\{term\_in},\39\\{true})$\C{clear
the terminal input buffer}\par
\P\D \37$\\{wake\_up\_terminal}\S$\1\6
\&{begin} \37\&{if} $\\{inskp0}$ \1\&{then}\2\6
\&{end}\C{cancel the user's cancellation of output}\2\par
\Y\P$\4\X34\*:Error handling procedures\X\S$\6
\4\&{function}\1\ \37\\{inskp0}: \37\\{boolean};\5
\\{extern};\par
\A sections~78, 81, 82, 93, 94, and~95.
\U section~4.\fi
\M35. We need a special routine to read the first line of \TeX\ input from
the user's terminal. This line is different because it is read before we
have opened the transcript file; there is sort of a ``chicken and
egg'' problem here. If the user types `\.{\\input paper}' on the first
line, or if some macro invoked by that line does such an \.{\\input},
the transcript file will be named `\.{paper.log}'; but if no \.{\\input}
commands are performed during the first line of terminal input, the transcript
file will acquire its default name `\.{texput.log}'. (The transcript file
will not contain error messages generated by the first line before the
first \.{\\input} command.)
The first line is even more special if we are lucky enough to have an operating
system that treats \TeX\ differently from a run-of-the-mill \PASCAL\ object
program. It's nice to let the user start running a \TeX\ job by typing
a command line like `\.{tex paper}'; in such a case, \TeX\ will operate
as if the first line of input were `\.{paper}', i.e., the first line will
consist of the remainder of the command line, after the part that invoked
\TeX.
\fi
\M36. Different systems have different ways to get started. But regardless of
what conventions are adopted, the routine that initializes the terminal
should satisfy the following specifications:
\yskip\textindent{1)}It should open file \\{term\_in} for input from the
terminal. (The file \\{term\_out} will already be open for output to the
terminal.)
\textindent{2)}If the user has given a command line, this line should be
considered the first line of terminal input. Otherwise the
user should be prompted with `\.{**}', and the first line of input
should be whatever is typed in response.
\textindent{3)}The first line of input, which might or might not be a
command line, should appear in locations \\{first} to $\\{last}-1$ of the
\\{buffer} array.
\textindent{4)}The global variable \\{loc} should be set so that the
character that \TeX\ reads next is in $\\{buffer}[\\{loc}]$. This
character should not be blank, and we should have $\\{loc}<\\{last}$.
\yskip\noindent(It may be necessary to prompt the user several times
before a non-blank line comes in. The prompt is `\.{**}' instead of the
later `\.*' because the meaning is slightly different: `\.{\\input}' need
not be typed immediately after~`\.{**}'.)
\Y\P\D \37$\\{loc}\S\\{cur\_input}.\\{loc\_field}$\C{location of first unread
character in \\{buffer}}\par
\fi
\M37\*. The following program does the required initialization
and accepts interrupts and also retrieves a possible command line, using
new system routines due to David R. Fuchs.
\Y\P\D \37$\\{pto\_chr}(\#)\S\\{ptwr1w}(0,\39\\{ord}(\#))$\C{put a character in
the line editor}\par
\Y\P\4\&{procedure}\1\ \37$\\{esci}(\mathop{\&{var}}\|x:\\{integer})$;\5
\\{extern};\5
\hbox{\2}\C{increments \|x each time the user types escape-I or break-I; the
program can change \|x whenever it wants to, but \|x had better be a global
variable}\7
\4\&{function}\1\ \37\\{rescan}: \37\\{boolean};\5
\\{extern};\5
\hbox{\2}\C{puts the command line into the terminal buffer, or returns %
\\{false} if there was no command line}\7
\4\&{function}\1\ \37$\\{tmp\_in}(\|f:\\{string};\,\35\mathop{\&{var}}\|s:%
\\{string})$: \37\\{integer};\5
\\{extern};\5
\hbox{\2}\C{reads \.{TMPCOR} file \|f into \|s, and returns its length ($\L0$
means error)}\7
\4\&{function}\1\ \37\\{cclsw}: \37\\{boolean};\5
\\{extern};\5
\hbox{\2}\C{was program started with \.{RUN} offset of 1 (i.e., from %
\.{SNAIL})?}\7
\4\&{procedure}\1\ \37$\\{ptwr1w}(\\{pty},\39\|c:\\{integer})$;\5
\\{extern};\5
\hbox{\2}\C{simulates typing of a character on a \.{PTY}}\7
\4\&{function}\1\ \37\\{init\_terminal}: \37\\{boolean};\C{gets the terminal
files started}\6
\4\&{label} \37\\{exit};\6
\4\&{var} \37\|l: \37\\{integer};\C{length returned by \\{tmp\_in}}\6
\\{line\_found}: \37\\{boolean};\C{have we scanned a line?}\6
\\{tmp\_cor\_buf}: \37\&{packed} \37\&{array} $[0\to100]$ \1\&{of}\5
\\{char};\C{where \\{tmp\_in} puts things}\2\2\6
\&{begin} \37\\{t\_open\_in};\5
$\\{esci}(\\{interrupt})$;\5
$\\{last}\K\\{first}$;\6
\&{if} $\\{cclsw}$ \1\&{then}\C{started by \.{TEX} monitor command}\6
\&{begin} \37$\|l\K\\{tmp\_in}(\.{\'TEX\'},\39\\{tmp\_cor\_buf})$;\5
$\\{loc}\K1$;\6
\&{while} $(\\{loc}<\|l)\W(\\{tmp\_cor\_buf}[\\{loc}]\I\.{\'←\'})$ \1\&{do}\5
$\\{incr}(\\{loc})$;\2\6
$\\{incr}(\\{loc})$;\6
\&{while} $\\{loc}<\|l$ \1\&{do}\6
\&{begin} \37\&{if} $\\{tmp\_cor\_buf}[\\{loc}]>\.{\'\ \'}$ \1\&{then}\6
\&{begin} \37$\\{buffer}[\\{last}]\K\\{xord}[\\{tmp\_cor\_buf}[\\{loc}]]$;\5
$\\{incr}(\\{last})$;\6
\&{end};\2\6
$\\{incr}(\\{loc})$;\6
\&{end};\2\6
\&{end}\6
\4\&{else} \&{debug} \37\&{if} $\\{false}$ \1\&{then}\ \2\6
\&{gubed}\2\6
\&{if} $\\{rescan}$ \1\&{then}\6
\&{begin} \37$\\{read\_ln}(\\{term\_in})$;\C{get first character into $\\{term%
\_in}\↑$}\6
\&{while} $(\R\\{eoln}(\\{term\_in}))\W(\\{term\_in}\↑\I\.{\';\'})$ \1\&{do}\5
$\\{get}(\\{term\_in})$;\2\6
\&{if} $\\{term\_in}\↑=\.{\';\'}$ \1\&{then}\6
\&{begin} \37$\\{get}(\\{term\_in})$;\6
\&{while} $\R\\{eoln}(\\{term\_in})$ \1\&{do}\6
\&{begin} \37$\\{buffer}[\\{last}]\K\\{xord}[\\{term\_in}\↑]$;\5
$\\{incr}(\\{last})$;\5
$\\{get}(\\{term\_in})$;\6
\&{end};\2\6
\&{end};\2\6
\&{end};\2\6
$\\{line\_found}\K(\\{last}>\\{first})$;\6
\~ \1\&{loop}\ \&{begin} \37$\\{loc}\K\\{first}$;\6
\&{while} $(\\{loc}<\\{last})\W(\\{buffer}[\\{loc}]=\.{"\ "})$ \1\&{do}\5
$\\{incr}(\\{loc})$;\2\6
\&{if} $\\{loc}<\\{last}$ \1\&{then}\6
\&{begin} \37$\\{init\_terminal}\K\\{true}$;\5
\&{return};\C{return unless the line was all blank}\6
\&{end};\2\6
\&{if} $\\{line\_found}$ \1\&{then}\5
$\\{write\_ln}(\\{term\_out},\39\.{\'Please\ type\ the\ name\ of\ your\ input\
file.\'})$;\2\6
\\{wake\_up\_terminal};\5
$\\{write}(\\{term\_out},\39\.{\'**\'})$;\5
\\{update\_terminal};\5
$\\{buffer}[\\{first}]\K0$;\C{\\{input\_ln} may look at $\\{buffer}[%
\\{first}]$}\6
\&{if} $\R\\{input\_ln}(\\{term\_in},\39\\{true})$ \1\&{then}\C{this shouldn't
happen}\6
\&{begin} \37$\\{write\_ln}(\\{term\_out})$;\5
$\\{write}(\\{term\_out},\39\.{\'!\ End\ of\ file\ on\ the\ terminal...\ why?%
\'})$;\5
$\\{init\_terminal}\K\\{false}$;\5
\&{return};\6
\&{end};\2\6
$\\{line\_found}\K\\{true}$;\6
\&{end};\2\6
\4\\{exit}: \37\&{end};\par
\fi
\N38. \[4] String handling.
Control sequence names and diagnostic messages are variable-length strings
of seven-bit characters. Since \PASCAL\ does not have a well-developed string
mechanism, \TeX\ does all of its string processing by homegrown methods.
Elaborate facilities for dynamic strings are not needed, so all of the
necessary operations can be handled with a simple data structure.
The array \\{str\_pool} contains all of the (seven-bit) ASCII codes in all
of the strings, and the array \\{str\_start} contains indices of the starting
points of each string. Strings are referred to by integer numbers, so that
string number \|s comprises the characters $\\{str\_pool}[\|j]$ for
$\\{str\_start}[\|s]\L\|j<\\{str\_start}[\|s+1]$. Additional integer variables
\\{pool\_ptr} and \\{str\_ptr} indicate the number of entries used so far
in \\{str\_pool} and \\{str\_start}, respectively; locations
$\\{str\_pool}[\\{pool\_ptr}]$ and $\\{str\_start}[\\{str\_ptr}]$ are
ready for the next string to be allocated.
String numbers 0 to 127 are reserved for strings that correspond to single
ASCII characters. This is in accordance with the conventions of \.{WEB},
which converts single-character strings into the ASCII code number of the
single character involved, while it converts other strings into integers
and builds a string pool file. Thus, when the string constant \.{"."} appears
in the program below, \.{WEB} converts it into the integer 46, which is the
ASCII code for a period, while \.{WEB} will convert a string like \.{"hello"}
into some integer greater than~127. String number 46 will presumably be the
single character `\..'; but some ASCII codes have no standard visible
representation, and \TeX\ sometimes needs to be able to print an arbitrary
ASCII character, so the first 128 strings are used to specify exactly what
should be printed for each of the 128 possibilities.
Elements of the \\{str\_pool} array must be ASCII codes that can actually be
printed; i.e., they must have an \\{xchr} equivalent in the local
character set. (However, the names of control sequences need not meet
this restriction, when they appear in \\{str\_pool}.)
\Y\P$\4\X18:Types in the outer block\X\mathrel{+}\S$\6
$\\{pool\_pointer}=0\to\\{pool\_size}$;\C{for variables that point into \\{str%
\_pool}}\6
$\\{str\_number}=0\to\\{max\_strings}$;\C{for variables that point into \\{str%
\_start}}\par
\fi
\M39. \P$\X13:Global variables\X\mathrel{+}\S$\6
\4\\{str\_pool}: \37\&{packed} \37\&{array} $[\\{pool\_pointer}]$ \1\&{of}\5
\\{ASCII\_code};\C{the characters}\2\6
\4\\{str\_start}: \37\&{array} $[\\{str\_number}]$ \1\&{of}\5
\\{pool\_pointer};\C{the starting pointers}\2\6
\4\\{pool\_ptr}: \37\\{pool\_pointer};\C{first unused position in \\{str%
\_pool}}\6
\4\\{str\_ptr}: \37\\{str\_number};\C{number of the current string being
created}\6
\4\\{init\_pool\_ptr}: \37\\{pool\_pointer};\C{the starting value of \\{pool%
\_ptr}}\6
\4\\{init\_str\_ptr}: \37\\{str\_number};\C{the starting value of \\{str\_ptr}}%
\par
\fi
\M40. Several of the elementary string operations are performed using \.{WEB}
macros instead of using \PASCAL\ procedures, because many of the
operations are done quite frequently and we want to avoid the
overhead of procedure calls. For example, here is
a simple macro that computes the length of a string.
\Y\P\D \37$\\{length}(\#)\S(\\{str\_start}[\#+1]-\\{str\_start}[\#])$\C{the
number of characters in string number \#}\par
\fi
\M41. The length of the current string is called \\{cur\_length}:
\Y\P\D \37$\\{cur\_length}\S(\\{pool\_ptr}-\\{str\_start}[\\{str\_ptr}])$\par
\fi
\M42. Strings are created by appending character codes to \\{str\_pool}.
The macro called \\{append\_char}, defined here, does not check to see if the
value of \\{pool\_ptr} has gotten too high; this test is supposed to be
made before \\{append\_char} is used. There is also a \\{flush\_char}
macro, which erases the last character appended.
To test if there is room to append \|l more characters to \\{str\_pool},
we shall write $\\{str\_room}(\|l)$, which aborts \TeX\ and gives an
apologetic error message if there isn't enough room.
\Y\P\D \37$\\{append\_char}(\#)\S$\C{put \\{ASCII\_code} \# at the end of %
\\{str\_pool}}\6
\&{begin} \37$\\{str\_pool}[\\{pool\_ptr}]\K\#$;\5
$\\{incr}(\\{pool\_ptr})$;\6
\&{end}\par
\P\D \37$\\{flush\_char}\S\\{decr}(\\{pool\_ptr})$\C{forget the last character
in the pool}\par
\P\D \37$\\{str\_room}(\#)\S$\C{make sure that the pool hasn't overflowed}\6
\&{begin} \37\&{if} $\\{pool\_ptr}+\#>\\{pool\_size}$ \1\&{then}\5
$\\{overflow}(\.{"pool\ size"},\39\\{pool\_size}-\\{init\_pool\_ptr})$;\2\6
\&{end}\par
\fi
\M43. Once a sequence of characters has been appended to \\{str\_pool}, it
officially becomes a string when the function \\{make\_string} is called.
This function returns the identification number of the new string as its
value.
\Y\P\4\&{function}\1\ \37\\{make\_string}: \37\\{str\_number};\C{current
string enters the pool}\2\6
\&{begin} \37\&{if} $\\{str\_ptr}=\\{max\_strings}$ \1\&{then}\5
$\\{overflow}(\.{"number\ of\ strings"},\39\\{max\_strings}-\\{init\_str%
\_ptr})$;\2\6
$\\{incr}(\\{str\_ptr})$;\5
$\\{str\_start}[\\{str\_ptr}]\K\\{pool\_ptr}$;\5
$\\{make\_string}\K\\{str\_ptr}-1$;\6
\&{end};\par
\fi
\M44. To destroy the most recently made string, we say \\{flush\_string}.
\Y\P\D \37$\\{flush\_string}\S$\1\6
\&{begin} \37$\\{decr}(\\{str\_ptr})$;\5
$\\{pool\_ptr}\K\\{str\_start}[\\{str\_ptr}]$;\6
\&{end}\2\par
\fi
\M45. The following subroutine compares string \|s with another string of the
same length that appears in \\{buffer} starting at position \|k;
the result is \\{true} if and only if the strings are equal.
Empirical tests indicate that \\{str\_eq\_buf} is used in such a way that
it tends to return \\{true} about 80 percent of the time.
\Y\P\4\&{function}\1\ \37$\\{str\_eq\_buf}(\|s:\\{str\_number};\,\35\|k:%
\\{integer})$: \37\\{boolean};\C{test equality of strings}\6
\4\&{label} \37\\{not\_found};\C{loop exit}\6
\4\&{var} \37\|j: \37\\{pool\_pointer};\C{running index}\6
\\{result}: \37\\{boolean};\C{result of comparison}\2\6
\&{begin} \37$\|j\K\\{str\_start}[\|s]$;\6
\&{while} $\|j<\\{str\_start}[\|s+1]$ \1\&{do}\6
\&{begin} \37\&{if} $\\{str\_pool}[\|j]\I\\{buffer}[\|k]$ \1\&{then}\6
\&{begin} \37$\\{result}\K\\{false}$;\5
\&{goto} \37\\{not\_found};\6
\&{end};\2\6
$\\{incr}(\|j)$;\5
$\\{incr}(\|k)$;\6
\&{end};\2\6
$\\{result}\K\\{true}$;\6
\4\\{not\_found}: \37$\\{str\_eq\_buf}\K\\{result}$;\6
\&{end};\par
\fi
\M46. Here is a similar routine, but it compares two strings in the string
pool,
and it does not assume that they have the same length.
\Y\P\4\&{function}\1\ \37$\\{str\_eq\_str}(\|s,\39\|t:\\{str\_number})$: \37%
\\{boolean};\C{test equality of strings}\6
\4\&{label} \37\\{not\_found};\C{loop exit}\6
\4\&{var} \37$\|j,\39\|k$: \37\\{pool\_pointer};\C{running indices}\6
\\{result}: \37\\{boolean};\C{result of comparison}\2\6
\&{begin} \37$\\{result}\K\\{false}$;\6
\&{if} $\\{length}(\|s)\I\\{length}(\|t)$ \1\&{then}\5
\&{goto} \37\\{not\_found};\2\6
$\|j\K\\{str\_start}[\|s]$;\5
$\|k\K\\{str\_start}[\|t]$;\6
\&{while} $\|j<\\{str\_start}[\|s+1]$ \1\&{do}\6
\&{begin} \37\&{if} $\\{str\_pool}[\|j]\I\\{str\_pool}[\|k]$ \1\&{then}\5
\&{goto} \37\\{not\_found};\2\6
$\\{incr}(\|j)$;\5
$\\{incr}(\|k)$;\6
\&{end};\2\6
$\\{result}\K\\{true}$;\6
\4\\{not\_found}: \37$\\{str\_eq\_str}\K\\{result}$;\6
\&{end};\par
\fi
\M47. The initial values of \\{str\_pool}, \\{str\_start}, \\{pool\_ptr},
and \\{str\_ptr} are computed by the \.{INITEX} program, based in part
on the information that \.{WEB} has output while processing \TeX.
\Y\P\&{init} \37\&{function}\1\ \37\\{get\_strings\_started}: \37\\{boolean};%
\C{initializes the string pool, but returns \\{false} if something goes wrong}%
\6
\4\&{label} \37$\\{done},\39\\{exit}$;\6
\4\&{var} \37$\|k,\39\|l$: \37$0\to127$;\C{small indices or counters}\6
$\|m,\39\|n$: \37\\{text\_char};\C{characters input from \\{pool\_file}}\6
\|g: \37\\{str\_number};\C{garbage}\6
\|a: \37\\{integer};\C{accumulator for check sum}\6
\|c: \37\\{boolean};\C{check sum has been checked}\2\6
\&{begin} \37$\\{pool\_ptr}\K0$;\5
$\\{str\_ptr}\K0$;\5
$\\{str\_start}[0]\K0$;\5
\X48:Make the first 128 strings\X;\6
\X51:Read the other strings from the \.{TEX.POOL} file and return \\{true}, or
give an error message and return \\{false}\X;\6
\4\\{exit}: \37\&{end};\6
\&{tini}\par
\fi
\M48. \P$\X48:Make the first 128 strings\X\S$\6
\&{for} $\|k\K0\mathrel{\&{to}}127$ \1\&{do}\6
\&{begin} \37\&{if} $(\X49\*:Character \|k cannot be printed\X)$ \1\&{then}\6
\&{begin} \37$\\{append\_char}(\.{"\↑"})$;\5
$\\{append\_char}(\.{"\↑"})$;\6
\&{if} $\|k<\O{100}$ \1\&{then}\5
$\\{append\_char}(\|k+\O{100})$\6
\4\&{else} $\\{append\_char}(\|k-\O{100})$;\2\6
\&{end}\6
\4\&{else} $\\{append\_char}(\|k)$;\2\6
$\|g\K\\{make\_string}$;\6
\&{end}\2\par
\U section~47.\fi
\M49\*. The first 128 strings will contain 95 standard ASCII characters, and
the
other 33 characters will be printed in three-symbol form like `\.{\↑\↑A}'
unless a system-dependent change is made here. Installations that have
an extended character set, where for example $\\{xchr}[\O{32}]=\hbox{\.{\'↑↑Z%
\'}}$,
would like string \O{32} to be the single character \O{32} instead of the
three characters \O{136}, \O{136}, \O{132} (\.{\↑\↑Z}). On the other hand,
even people with an extended character set will want to represent string
\O{15} by \.{\↑\↑M}, since \O{15} is \\{carriage\_return}; the idea is to
produce visible strings instead of tabs or line-feeds or carriage-returns
or bell-rings or characters that are treated anomalously in text files.
The boolean expression defined here should be \\{true} unless \TeX\ internal
code number~\|k corresponds to a non-troublesome visible symbol in the
local character set. At MIT, for example, the appropriate formula would
be `$\|k\in[0,\O{10}\to\O{12},\O{14},\O{15},\O{33},\O{177}]$'. If character %
\|k cannot be
printed, then character $\|k+\O{100}$ or $\|k-\O{100}$ must be printable; thus,
at
least 64 printable characters are needed.
\Y\P$\4\X49\*:Character \|k cannot be printed\X\S$\6
$(\|k<\.{"\ "})\V(\|k>\.{"\~"})$\par
\U section~48.\fi
\M50. When the \.{WEB} system program called \.{TANGLE} processes the %
\.{TEX.WEB}
description that you are now reading, it outputs the \PASCAL\ program
\.{TEX.PAS} and also a string pool file called \.{TEX.POOL}. The \.{INITEX}
program reads the latter file, where each string appears as a two-digit decimal
length followed by the string itself, and the information is recorded in
\TeX's string memory.
\Y\P$\4\X13:Global variables\X\mathrel{+}\S$\6
\&{init} \37\\{pool\_file}: \37\\{alpha\_file};\C{the string-pool file output
by \.{TANGLE}}\6
\&{tini}\par
\fi
\M51. \P\D \37$\\{bad\_pool}(\#)\S$\1\6
\&{begin} \37\\{wake\_up\_terminal};\5
$\\{write\_ln}(\\{term\_out},\39\#)$;\5
$\\{a\_close}(\\{pool\_file})$;\5
$\\{get\_strings\_started}\K\\{false}$;\5
\&{return};\6
\&{end}\2\par
\Y\P$\4\X51:Read the other strings from the \.{TEX.POOL} file and return %
\\{true}, or give an error message and return \\{false}\X\S$\6
$\\{name\_of\_file}\K\\{pool\_name}$;\C{we needn't set \\{name\_length}}\6
\&{if} $\\{a\_open\_in}(\\{pool\_file})$ \1\&{then}\6
\&{begin} \37$\|c\K\\{false}$;\6
\1\&{repeat} \37\X52:Read one string, but return \\{false} if the string memory
space is getting too tight for comfort\X;\6
\4\&{until}\5
\|c;\2\6
$\\{a\_close}(\\{pool\_file})$;\5
$\\{get\_strings\_started}\K\\{true}$;\6
\&{end}\6
\4\&{else} $\\{bad\_pool}(\.{\'!\ I\ can\'}\.{\'t\ read\ TEX.POOL.\'})$\2\par
\U section~47.\fi
\M52. \P$\X52:Read one string, but return \\{false} if the string memory space
is getting too tight for comfort\X\S$\6
\&{begin} \37\&{if} $\\{eof}(\\{pool\_file})$ \1\&{then}\5
$\\{bad\_pool}(\.{\'!\ TEX.POOL\ has\ no\ check\ sum.\'})$;\2\6
$\\{read}(\\{pool\_file},\39\|m,\39\|n)$;\C{read two digits of string length}\6
\&{if} $\|m=\.{\'*\'}$ \1\&{then}\5
\X53:Check the pool check sum\X\6
\4\&{else} \&{begin} \37\&{if} $(\\{xord}[\|m]<\.{"0"})\V(\\{xord}[\|m]>%
\.{"9"})\V\30(\\{xord}[\|n]<\.{"0"})\V(\\{xord}[\|n]>\.{"9"})$ \1\&{then}\5
$\\{bad\_pool}(\.{\'!\ TEX.POOL\ line\ doesn\'}\.{\'t\ begin\ with\ two\
digits.\'})$;\2\6
$\|l\K\\{xord}[\|m]\ast10+\\{xord}[\|n]-\.{"0"}\ast11$;\C{compute the length}\6
\&{if} $\\{pool\_ptr}+\|l+\\{string\_vacancies}>\\{pool\_size}$ \1\&{then}\5
$\\{bad\_pool}(\.{\'!\ You\ have\ to\ increase\ POOLSIZE.\'})$;\2\6
\&{for} $\|k\K1\mathrel{\&{to}}\|l$ \1\&{do}\6
\&{begin} \37\&{if} $\\{eoln}(\\{pool\_file})$ \1\&{then}\5
$\|m\K\.{\'\ \'}$\ \&{else} $\\{read}(\\{pool\_file},\39\|m)$;\2\6
$\\{append\_char}(\\{xord}[\|m])$;\6
\&{end};\2\6
$\\{read\_ln}(\\{pool\_file})$;\5
$\|g\K\\{make\_string}$;\6
\&{end};\2\6
\&{end}\par
\U section~51.\fi
\M53. The \.{WEB} operation \.{@\$} denotes the value that should be at the
end of this \.{TEX.POOL} file; any other value means that the wrong pool
file has been loaded.
\Y\P$\4\X53:Check the pool check sum\X\S$\6
\&{begin} \37$\|a\K0$;\5
$\|k\K1$;\6
\~ \1\&{loop}\ \&{begin} \37\&{if} $(\\{xord}[\|n]<\.{"0"})\V(\\{xord}[\|n]>%
\.{"9"})$ \1\&{then}\5
$\\{bad\_pool}(\.{\'!\ TEX.POOL\ check\ sum\ doesn\'}\.{\'t\ have\ nine\
digits.\'})$;\2\6
$\|a\K10\ast\|a+\\{xord}[\|n]-\.{"0"}$;\6
\&{if} $\|k=9$ \1\&{then}\5
\&{goto} \37\\{done};\2\6
$\\{incr}(\|k)$;\5
$\\{read}(\\{pool\_file},\39\|n)$;\6
\&{end};\2\6
\4\\{done}: \37\&{if} $\|a\I\)$ \1\&{then}\5
$\\{bad\_pool}(\.{\'!\ TEX.POOL\ doesn\'}\.{\'t\ match;\ TANGLE\ me\ again.%
\'})$;\2\6
$\|c\K\\{true}$;\6
\&{end}\par
\U section~52.\fi
\N54. \[5] On-line and off-line printing.
Messages that are sent to a user's terminal and to the transcript-log file
are produced by several `\\{print}' procedures. These procedures will
direct their output to a variety of places, based on the setting of
the global variable \\{selector}, which has the following possible
values:
\yskip
\hang \\{term\_and\_log}, the normal setting, prints on the terminal and on the
transcript file.
\hang \\{log\_only}, prints only on the transcript file.
\hang \\{term\_only}, prints only on the terminal.
\hang \\{no\_print}, doesn't print at all. This is used only in rare cases
before the transcript file is open.
\hang \\{pseudo}, puts output into a cyclic buffer that is used
by the \\{show\_context} routine; see that routine for the
reasoning behind this curious mode.
\hang \\{new\_string}, appends the output to the current string in the
string pool.
\hang 0 to 15, prints on one of the sixteen files for \.{\\write} output.
\yskip
\noindent The symbolic names `\\{term\_and\_log}', etc., have been assigned
numeric codes that satisfy the convenient relations $\\{no\_print}+1=\\{term%
\_only}$,
$\\{no\_print}+2=\\{log\_only}$, $\\{term\_only}+2=\\{log\_only}+1=\\{term\_and%
\_log}$.
Three additional global variables, \\{tally} and \\{term\_offset} and
\\{file\_offset}, record the number of characters that have been printed
since they were most recently cleared to zero. We use \\{tally} to record
the length of (possibly very long) stretches of printing; \\{term\_offset}
and \\{file\_offset}, on the other hand, keep track of how many characters
have appeared so far on the current line that has been output to the
terminal or to the transcript file, respectively.
\Y\P\D \37$\\{no\_print}=16$\C{\\{selector} setting that makes data disappear}%
\par
\P\D \37$\\{term\_only}=17$\C{printing is destined for the terminal only}\par
\P\D \37$\\{log\_only}=18$\C{printing is destined for the transcript file only}%
\par
\P\D \37$\\{term\_and\_log}=19$\C{normal \\{selector} setting}\par
\P\D \37$\\{pseudo}=20$\C{special \\{selector} setting for \\{show\_context}}%
\par
\P\D \37$\\{new\_string}=21$\C{printing is deflected to the string pool}\par
\P\D \37$\\{max\_selector}=21$\C{highest selector setting}\par
\Y\P$\4\X13:Global variables\X\mathrel{+}\S$\6
\4\\{log\_file}: \37\\{alpha\_file};\C{transcript of \TeX\ session}\6
\4\\{selector}: \37$0\to\\{max\_selector}$;\C{where to print a message}\6
\4\\{dig}: \37\&{array} $[0\to22]$ \1\&{of}\5
$0\to15$;\C{digits in a number being output}\2\6
\4\\{tally}: \37\\{integer};\C{the number of characters recently printed}\6
\4\\{term\_offset}: \37$0\to\\{max\_print\_line}$;\C{the number of characters
on the current terminal line}\6
\4\\{file\_offset}: \37$0\to\\{max\_print\_line}$;\C{the number of characters
on the current file line}\6
\4\\{trick\_buf}: \37\&{array} $[0\to\\{error\_line}]$ \1\&{of}\5
\\{ASCII\_code};\C{circular buffer for pseudoprinting}\2\6
\4\\{trick\_count}: \37\\{integer};\C{threshold for pseudoprinting, explained
later}\6
\4\\{first\_count}: \37\\{integer};\C{another variable for pseudoprinting}\par
\fi
\M55. \P$\X55:Initialize the output routines\X\S$\6
$\\{selector}\K\\{term\_only}$;\5
$\\{tally}\K0$;\5
$\\{term\_offset}\K0$;\5
$\\{file\_offset}\K0$;\par
\A sections~61, 528, and~533.
\U section~1332.\fi
\M56. Macro abbreviations for output to the terminal and to the log file are
defined here for convenience. Some systems need special conventions
for terminal output, and it is possible to adhere to those conventions
by changing \\{wterm}, \\{wterm\_ln}, and \\{wterm\_cr} in this section.
\Y\P\D \37$\\{wterm}(\#)\S\\{write}(\\{term\_out},\39\#)$\par
\P\D \37$\\{wterm\_ln}(\#)\S\\{write\_ln}(\\{term\_out},\39\#)$\par
\P\D \37$\\{wterm\_cr}\S\\{write\_ln}(\\{term\_out})$\par
\P\D \37$\\{wlog}(\#)\S\\{write}(\\{log\_file},\39\#)$\par
\P\D \37$\\{wlog\_ln}(\#)\S\\{write\_ln}(\\{log\_file},\39\#)$\par
\P\D \37$\\{wlog\_cr}\S\\{write\_ln}(\\{log\_file})$\par
\fi
\M57. To end a line of text output, we call \\{print\_ln}.
\Y\P$\4\X57:Basic printing procedures\X\S$\6
\4\&{procedure}\1\ \37\\{print\_ln};\C{prints an end-of-line}\2\6
\&{begin} \37\&{case} $\\{selector}$ \1\&{of}\6
\4\\{term\_and\_log}: \37\&{begin} \37\\{wterm\_cr};\5
\\{wlog\_cr};\5
$\\{term\_offset}\K0$;\5
$\\{file\_offset}\K0$;\6
\&{end};\6
\4\\{log\_only}: \37\&{begin} \37\\{wlog\_cr};\5
$\\{file\_offset}\K0$;\6
\&{end};\6
\4\\{term\_only}: \37\&{begin} \37\\{wterm\_cr};\5
$\\{term\_offset}\K0$;\6
\&{end};\6
\4$\\{no\_print},\39\\{pseudo},\39\\{new\_string}$: \37\\{do\_nothing};\6
\4\&{othercases} \37$\\{write\_ln}(\\{write\_file}[\\{selector}])$\2\6
\&{endcases};\6
\&{end};\C{\\{tally} is not affected}\par
\A sections~58, 59, 60, 62, 63, 64, 65, 262, 263, 518\*, 699, and~1355.
\U section~4.\fi
\M58. The \\{print\_char} procedure sends one character to the desired
destination,
using the \\{xchr} array to map it into an external character compatible with
\\{input\_ln}. All printing comes through \\{print\_ln} or \\{print\_char}.
\Y\P$\4\X57:Basic printing procedures\X\mathrel{+}\S$\6
\4\&{procedure}\1\ \37$\\{print\_char}(\|s:\\{ASCII\_code})$;\C{prints a
single character}\6
\4\&{label} \37\\{exit};\2\6
\&{begin} \37\&{if} $\X244:Character \|s is the current new-line character\X$ %
\1\&{then}\6
\&{if} $\\{selector}<\\{pseudo}$ \1\&{then}\6
\&{begin} \37\\{print\_ln};\5
\&{return};\6
\&{end};\2\2\6
\&{case} $\\{selector}$ \1\&{of}\6
\4\\{term\_and\_log}: \37\&{begin} \37$\\{wterm}(\\{xchr}[\|s])$;\5
$\\{wlog}(\\{xchr}[\|s])$;\5
$\\{incr}(\\{term\_offset})$;\5
$\\{incr}(\\{file\_offset})$;\6
\&{if} $\\{term\_offset}=\\{max\_print\_line}$ \1\&{then}\6
\&{begin} \37\\{wterm\_cr};\5
$\\{term\_offset}\K0$;\6
\&{end};\2\6
\&{if} $\\{file\_offset}=\\{max\_print\_line}$ \1\&{then}\6
\&{begin} \37\\{wlog\_cr};\5
$\\{file\_offset}\K0$;\6
\&{end};\2\6
\&{end};\6
\4\\{log\_only}: \37\&{begin} \37$\\{wlog}(\\{xchr}[\|s])$;\5
$\\{incr}(\\{file\_offset})$;\6
\&{if} $\\{file\_offset}=\\{max\_print\_line}$ \1\&{then}\5
\\{print\_ln};\2\6
\&{end};\6
\4\\{term\_only}: \37\&{begin} \37$\\{wterm}(\\{xchr}[\|s])$;\5
$\\{incr}(\\{term\_offset})$;\6
\&{if} $\\{term\_offset}=\\{max\_print\_line}$ \1\&{then}\5
\\{print\_ln};\2\6
\&{end};\6
\4\\{no\_print}: \37\\{do\_nothing};\6
\4\\{pseudo}: \37\&{if} $\\{tally}<\\{trick\_count}$ \1\&{then}\5
$\\{trick\_buf}[\\{tally}\mathbin{\&{mod}}\\{error\_line}]\K\|s$;\2\6
\4\\{new\_string}: \37\&{begin} \37\&{if} $\\{pool\_ptr}<\\{pool\_size}$ \1%
\&{then}\5
$\\{append\_char}(\|s)$;\2\6
\&{end};\C{we drop characters if the string space is full}\6
\4\&{othercases} \37$\\{write}(\\{write\_file}[\\{selector}],\39\\{xchr}[\|s])$%
\2\6
\&{endcases};\6
$\\{incr}(\\{tally})$;\6
\4\\{exit}: \37\&{end};\par
\fi
\M59. An entire string is output by calling \\{print}. Note that if we are
outputting
the single standard ASCII character \.c, we could call $\\{print}(\.{"c"})$,
since
$\.{"c"}=99$ is the number of a single-character string, as explained above.
But
$\\{print\_char}(\.{"c"})$ is quicker, so \TeX\ goes directly to the \\{print%
\_char}
routine when it knows that this is safe. (The present implementation
assumes that it is always safe to print a visible ASCII character.)
\Y\P$\4\X57:Basic printing procedures\X\mathrel{+}\S$\6
\4\&{procedure}\1\ \37$\\{print}(\|s:\\{integer})$;\C{prints string \|s}\6
\4\&{label} \37\\{exit};\6
\4\&{var} \37\|j: \37\\{pool\_pointer};\C{current character code position}\2\6
\&{begin} \37\&{if} $\|s\G\\{str\_ptr}$ \1\&{then}\5
$\|s\K\.{"???"}$\C{this can't happen}\6
\4\&{else} \&{if} $\|s<128$ \1\&{then}\6
\&{if} $\|s<0$ \1\&{then}\5
$\|s\K\.{"???"}$\C{can't happen}\6
\4\&{else} \&{if} $(\X244:Character \|s is the current new-line character\X)$ %
\1\&{then}\6
\&{if} $\\{selector}<\\{pseudo}$ \1\&{then}\6
\&{begin} \37\\{print\_ln};\5
\&{return};\6
\&{end};\2\2\2\2\2\6
$\|j\K\\{str\_start}[\|s]$;\6
\&{while} $\|j<\\{str\_start}[\|s+1]$ \1\&{do}\6
\&{begin} \37$\\{print\_char}(\\{str\_pool}[\|j])$;\5
$\\{incr}(\|j)$;\6
\&{end};\2\6
\4\\{exit}: \37\&{end};\par
\fi
\M60. Control sequence names might contain \\{ASCII\_code} values that can't
be printed using \\{print\_char}. Therefore we use \\{slow\_print} for them:
\Y\P$\4\X57:Basic printing procedures\X\mathrel{+}\S$\6
\4\&{procedure}\1\ \37$\\{slow\_print}(\|s:\\{integer})$;\C{prints string \|s}%
\6
\4\&{label} \37\\{exit};\6
\4\&{var} \37\|j: \37\\{pool\_pointer};\C{current character code position}\2\6
\&{begin} \37\&{if} $\|s\G\\{str\_ptr}$ \1\&{then}\5
$\|s\K\.{"???"}$\C{this can't happen}\6
\4\&{else} \&{if} $\|s<128$ \1\&{then}\6
\&{if} $\|s<0$ \1\&{then}\5
$\|s\K\.{"???"}$\C{can't happen}\6
\4\&{else} \&{if} $(\X244:Character \|s is the current new-line character\X)$ %
\1\&{then}\6
\&{if} $\\{selector}<\\{pseudo}$ \1\&{then}\6
\&{begin} \37\\{print\_ln};\5
\&{return};\6
\&{end};\2\2\2\2\2\6
$\|j\K\\{str\_start}[\|s]$;\6
\&{while} $\|j<\\{str\_start}[\|s+1]$ \1\&{do}\6
\&{begin} \37$\\{print}(\\{str\_pool}[\|j])$;\5
$\\{incr}(\|j)$;\6
\&{end};\2\6
\4\\{exit}: \37\&{end};\par
\fi
\M61. Here is the very first thing that \TeX\ prints: a headline that
identifies
the version number and format package. The \\{term\_offset} variable is
temporarily
incorrect, but the discrepancy is not serious since we assume that the banner
and format identifier together will occupy at most \\{max\_print\_line}
character positions.
\Y\P$\4\X55:Initialize the output routines\X\mathrel{+}\S$\6
$\\{wterm}(\\{banner})$;\6
\&{if} $\\{format\_ident}=0$ \1\&{then}\5
$\\{wterm\_ln}(\.{\'\ (no\ format\ preloaded)\'})$\6
\4\&{else} \&{begin} \37$\\{print}(\\{format\_ident})$;\5
\\{print\_ln};\6
\&{end};\2\6
\\{update\_terminal};\par
\fi
\M62. The procedure \\{print\_nl} is like \\{print}, but it makes sure that the
string appears at the beginning of a new line.
\Y\P$\4\X57:Basic printing procedures\X\mathrel{+}\S$\6
\4\&{procedure}\1\ \37$\\{print\_nl}(\|s:\\{str\_number})$;\C{prints string %
\|s at beginning of line}\2\6
\&{begin} \37\&{if} $((\\{term\_offset}>0)\W(\\{odd}(\\{selector})))\V\30((%
\\{file\_offset}>0)\W(\\{selector}\G\\{log\_only}))$ \1\&{then}\5
\\{print\_ln};\2\6
$\\{print}(\|s)$;\6
\&{end};\par
\fi
\M63. The procedure \\{print\_esc} prints a string that is preceded by
the user's escape character (which is usually a backslash).
\Y\P$\4\X57:Basic printing procedures\X\mathrel{+}\S$\6
\4\&{procedure}\1\ \37$\\{print\_esc}(\|s:\\{str\_number})$;\C{prints escape
character, then \|s}\6
\4\&{var} \37\|c: \37\\{integer};\C{the escape character code}\2\6
\&{begin} \37\X243:Set variable \|c to the current escape character\X;\6
\&{if} $\|c\G0$ \1\&{then}\6
\&{if} $\|c<128$ \1\&{then}\5
$\\{print}(\|c)$;\2\2\6
$\\{print}(\|s)$;\6
\&{end};\par
\fi
\M64. An array of digits in the range $0\to15$ is printed by \\{print\_the%
\_digs}.
\Y\P$\4\X57:Basic printing procedures\X\mathrel{+}\S$\6
\4\&{procedure}\1\ \37$\\{print\_the\_digs}(\|k:\\{eight\_bits})$;\C{prints $%
\\{dig}[\|k-1]$$\,\ldots\,$$\\{dig}[0]$}\2\6
\&{begin} \37\&{while} $\|k>0$ \1\&{do}\6
\&{begin} \37$\\{decr}(\|k)$;\6
\&{if} $\\{dig}[\|k]<10$ \1\&{then}\5
$\\{print\_char}(\.{"0"}+\\{dig}[\|k])$\6
\4\&{else} $\\{print\_char}(\.{"A"}-10+\\{dig}[\|k])$;\2\6
\&{end};\2\6
\&{end};\par
\fi
\M65. The following procedure, which prints out the decimal representation of a
given integer \|n, has been written carefully so that it works properly
if $\|n=0$ or if $(-\|n)$ would cause overflow. It does not apply $\mathbin{%
\&{mod}}$ or $\mathbin{\&{div}}$
to negative arguments, since such operations are not implemented consistently
by all \PASCAL\ compilers.
\Y\P$\4\X57:Basic printing procedures\X\mathrel{+}\S$\6
\4\&{procedure}\1\ \37$\\{print\_int}(\|n:\\{integer})$;\C{prints an integer
in decimal form}\6
\4\&{var} \37\|k: \37$0\to23$;\C{index to current digit; we assume that $%
\|n<10↑{23}$}\6
\|m: \37\\{integer};\C{used to negate \|n in possibly dangerous cases}\2\6
\&{begin} \37$\|k\K0$;\6
\&{if} $\|n<0$ \1\&{then}\6
\&{begin} \37$\\{print\_char}(\.{"-"})$;\6
\&{if} $\|n>-100000000$ \1\&{then}\5
$\\{negate}(\|n)$\6
\4\&{else} \&{begin} \37$\|m\K-1-\|n$;\5
$\|n\K\|m\mathbin{\&{div}}10$;\5
$\|m\K(\|m\mathbin{\&{mod}}10)+1$;\5
$\|k\K1$;\6
\&{if} $\|m<10$ \1\&{then}\5
$\\{dig}[0]\K\|m$\6
\4\&{else} \&{begin} \37$\\{dig}[0]\K0$;\5
$\\{incr}(\|n)$;\6
\&{end};\2\6
\&{end};\2\6
\&{end};\2\6
\1\&{repeat} \37$\\{dig}[\|k]\K\|n\mathbin{\&{mod}}10$;\5
$\|n\K\|n\mathbin{\&{div}}10$;\5
$\\{incr}(\|k)$;\6
\4\&{until}\5
$\|n=0$;\2\6
$\\{print\_the\_digs}(\|k)$;\6
\&{end};\par
\fi
\M66. Here is a trivial procedure to print two digits; it is usually called
with
a parameter in the range $0\L\|n\L99$.
\Y\P\4\&{procedure}\1\ \37$\\{print\_two}(\|n:\\{integer})$;\C{prints two
least significant digits}\2\6
\&{begin} \37$\|n\K\\{abs}(\|n)\mathbin{\&{mod}}100$;\5
$\\{print\_char}(\.{"0"}+(\|n\mathbin{\&{div}}10))$;\5
$\\{print\_char}(\.{"0"}+(\|n\mathbin{\&{mod}}10))$;\6
\&{end};\par
\fi
\M67. Hexadecimal printing of nonnegative integers is accomplished by \\{print%
\_hex}.
\Y\P\4\&{procedure}\1\ \37$\\{print\_hex}(\|n:\\{integer})$;\C{prints a
positive integer in hexadecimal form}\6
\4\&{var} \37\|k: \37$0\to22$;\C{index to current digit; we assume that $0\L
n<16↑{22}$}\2\6
\&{begin} \37$\|k\K0$;\5
$\\{print\_char}(\.{""}\.{""})$;\6
\1\&{repeat} \37$\\{dig}[\|k]\K\|n\mathbin{\&{mod}}16$;\5
$\|n\K\|n\mathbin{\&{div}}16$;\5
$\\{incr}(\|k)$;\6
\4\&{until}\5
$\|n=0$;\2\6
$\\{print\_the\_digs}(\|k)$;\6
\&{end};\par
\fi
\M68. In certain situations, \TeX\ prints either a standard visible ASCII
character or its hexadecimal ASCII code.
\Y\P\4\&{procedure}\1\ \37$\\{print\_ASCII}(\|c:\\{integer})$;\C{prints a
character or its code}\2\6
\&{begin} \37\&{if} $(\|c\G0)\W(\|c\L127)$ \1\&{then}\5
$\\{print}(\|c)$\6
\4\&{else} \&{begin} \37$\\{print\_char}(\.{"["})$;\6
\&{if} $\|c<0$ \1\&{then}\5
$\\{print\_int}(\|c)$\ \&{else} $\\{print\_hex}(\|c)$;\2\6
$\\{print\_char}(\.{"]"})$;\6
\&{end};\2\6
\&{end};\par
\fi
\M69. Roman numerals are produced by the \\{print\_roman\_int} routine.
Readers
who like puzzles might enjoy trying to figure out how this tricky code
works; therefore no explanation will be given.
\Y\P\4\&{procedure}\1\ \37$\\{print\_roman\_int}(\|n:\\{integer})$;\6
\4\&{label} \37\\{exit};\6
\4\&{var} \37$\|j,\39\|k$: \37\\{pool\_pointer};\C{mysterious indices into %
\\{str\_pool}}\6
$\|u,\39\|v$: \37\\{nonnegative\_integer};\C{mysterious numbers}\2\6
\&{begin} \37$\|j\K\\{str\_start}[\.{"m2d5c2l5x2v5i"}]$;\5
$\|v\K1000$;\6
\~ \1\&{loop}\ \&{begin} \37\&{while} $\|n\G\|v$ \1\&{do}\6
\&{begin} \37$\\{print\_char}(\\{str\_pool}[\|j])$;\5
$\|n\K\|n-\|v$;\6
\&{end};\2\6
\&{if} $\|n\L0$ \1\&{then}\5
\&{return};\C{nonpositive input produces no output}\2\6
$\|k\K\|j+2$;\5
$\|u\K\|v\mathbin{\&{div}}(\\{str\_pool}[\|k-1]-\.{"0"})$;\6
\&{if} $\\{str\_pool}[\|k-1]=\.{"2"}$ \1\&{then}\6
\&{begin} \37$\|k\K\|k+2$;\5
$\|u\K\|u\mathbin{\&{div}}(\\{str\_pool}[\|k-1]-\.{"0"})$;\6
\&{end};\2\6
\&{if} $\|n+\|u\G\|v$ \1\&{then}\6
\&{begin} \37$\\{print\_char}(\\{str\_pool}[\|k])$;\5
$\|n\K\|n+\|u$;\6
\&{end}\6
\4\&{else} \&{begin} \37$\|j\K\|j+2$;\5
$\|v\K\|v\mathbin{\&{div}}(\\{str\_pool}[\|j-1]-\.{"0"})$;\6
\&{end};\2\6
\&{end};\2\6
\4\\{exit}: \37\&{end};\par
\fi
\M70. The \\{print} subroutine will not print a string that is still being
created. The following procedure will.
\Y\P\4\&{procedure}\1\ \37\\{print\_current\_string};\C{prints a yet-unmade
string}\6
\4\&{var} \37\|j: \37\\{pool\_pointer};\C{points to current character code}\2\6
\&{begin} \37$\|j\K\\{str\_start}[\\{str\_ptr}]$;\6
\&{while} $\|j<\\{pool\_ptr}$ \1\&{do}\6
\&{begin} \37$\\{print\_char}(\\{str\_pool}[\|j])$;\5
$\\{incr}(\|j)$;\6
\&{end};\2\6
\&{end};\par
\fi
\M71\*. Here is a procedure that asks the user to type a line of input,
assuming that the \\{selector} setting is either \\{term\_only} or \\{term\_and%
\_log}.
The input is placed into locations \\{first} through $\\{last}-1$ of the
\\{buffer} array, and echoed on the transcript file if appropriate.
This procedure is never called when $\\{interaction}<\\{scroll\_mode}$.
\Y\P\D \37$\\{prompt\_input}(\#)\S$\1\6
\&{begin} \37\\{wake\_up\_terminal};\5
$\\{print}(\#)$;\5
\\{term\_input};\6
\&{end}\C{prints a string and gets a line of input}\2\par
\Y\P\4\&{procedure}\1\ \37\\{term\_input};\C{gets a line from the terminal}\6
\4\&{var} \37\|k: \37$0\to\\{buf\_size}$;\C{index into \\{buffer}}\2\6
\&{begin} \37\\{update\_terminal};\C{Now the user sees the prompt for sure}\6
$\\{buffer}[\\{first}]\K0$;\C{makes sure \\{input\_ln} doesn't find a \\{form%
\_feed}}\6
\&{if} $\R\\{input\_ln}(\\{term\_in},\39\\{true})$ \1\&{then}\5
$\\{fatal\_error}(\.{"End\ of\ file\ on\ the\ terminal!"})$;\2\6
$\\{term\_offset}\K0$;\C{the user's line ended with \<\rm return>}\6
$\\{decr}(\\{selector})$;\C{prepare to echo the input}\6
\&{if} $\\{last}\I\\{first}$ \1\&{then}\6
\&{for} $\|k\K\\{first}\mathrel{\&{to}}\\{last}-1$ \1\&{do}\5
$\\{print}(\\{buffer}[\|k])$;\2\2\6
\\{print\_ln};\5
$\\{incr}(\\{selector})$;\C{restore previous status}\6
\&{end};\par
\fi
\N72. \[6] Reporting errors.
When something anomalous is detected, \TeX\ typically does something like this:
$$\vbox{\halign{#\hfil\cr
$\\{print\_err}(\.{"Something\ anomalous\ has\ been\ detected"})$;\cr
$\\{help3}(\.{"This\ is\ the\ first\ line\ of\ my\ offer\ to\ help."})$\cr
$(\.{"This\ is\ the\ second\ line.\ I\'m\ trying\ to"})$\cr
$(\.{"explain\ the\ best\ way\ for\ you\ to\ proceed."})$;\cr
\\{error};\cr}}$$
A two-line help message would be given using \\{help2}, etc.; these informal
helps should use simple vocabulary that complements the words used in the
official error message that was printed. (Outside of the U.S.A., the help
messages should preferably be translated into the local vernacular. Each
line of help is at most 60 characters long, in the present implementation,
so that \\{max\_print\_line} will not be exceeded.)
The \\{print\_err} procedure supplies a `\.!' before the official message,
and makes sure that the terminal is awake if a stop is going to occur.
The \\{error} procedure supplies a `\..' after the official message, then it
shows the location of the error; and if $\\{interaction}=\\{error\_stop%
\_mode}$,
it also enters into a dialog with the user, during which time the help
message may be printed.
\fi
\M73. The global variable \\{interaction} has four settings, representing
increasing
amounts of user interaction:
\Y\P\D \37$\\{batch\_mode}=0$\C{omits all stops and omits terminal output}\par
\P\D \37$\\{nonstop\_mode}=1$\C{omits all stops}\par
\P\D \37$\\{scroll\_mode}=2$\C{omits error stops}\par
\P\D \37$\\{error\_stop\_mode}=3$\C{stops at every opportunity to interact}\par
\P\D \37$\\{print\_err}(\#)\S$\1\6
\&{begin} \37\&{if} $\\{interaction}=\\{error\_stop\_mode}$ \1\&{then}\5
\\{wake\_up\_terminal};\2\6
$\\{print\_nl}(\.{"!\ "})$;\5
$\\{print}(\#)$;\6
\&{end}\2\par
\Y\P$\4\X13:Global variables\X\mathrel{+}\S$\6
\4\\{interaction}: \37$\\{batch\_mode}\to\\{error\_stop\_mode}$;\C{current
level of interaction}\par
\fi
\M74. \P$\X21:Set initial values of key variables\X\mathrel{+}\S$\6
$\\{interaction}\K\\{error\_stop\_mode}$;\par
\fi
\M75. \TeX\ is careful not to call \\{error} when the print \\{selector}
setting
might be unusual. The only possible values of \\{selector} at the time of
error messages are
\yskip\hang\\{no\_print} (when $\\{interaction}=\\{batch\_mode}$
and \\{log\_file} not yet open);
\hang\\{term\_only} (when $\\{interaction}>\\{batch\_mode}$ and \\{log\_file}
not yet open);
\hang\\{log\_only} (when $\\{interaction}=\\{batch\_mode}$ and \\{log\_file} is
open);
\hang\\{term\_and\_log} (when $\\{interaction}>\\{batch\_mode}$ and \\{log%
\_file} is open).
\Y\P$\4\X75:Initialize the print \\{selector} based on \\{interaction}\X\S$\6
\&{if} $\\{interaction}=\\{batch\_mode}$ \1\&{then}\5
$\\{selector}\K\\{no\_print}$\ \&{else} $\\{selector}\K\\{term\_only}$\2\par
\U sections~1265 and~1337.\fi
\M76. A global variable \\{deletions\_allowed} is set \\{false} if the \\{get%
\_next}
routine is active when \\{error} is called; this ensures that \\{get\_next}
and related routines like \\{get\_token} will never be called recursively.
The global variable \\{history} records the worst level of error that
has been detected. It has four possible values: \\{spotless}, \\{warning%
\_issued},
\\{error\_message\_issued}, and \\{fatal\_error\_stop}.
Another global variable, \\{error\_count}, is increased by one when an
\\{error} occurs without an interactive dialog, and it is reset to zero at
the end of every paragraph. If \\{error\_count} reaches 100, \TeX\ decides
that there is no point in continuing further.
\Y\P\D \37$\\{spotless}=0$\C{\\{history} value when nothing has been amiss yet}%
\par
\P\D \37$\\{warning\_issued}=1$\C{\\{history} value when \\{begin\_diagnostic}
has been called}\par
\P\D \37$\\{error\_message\_issued}=2$\C{\\{history} value when \\{error} has
been called}\par
\P\D \37$\\{fatal\_error\_stop}=3$\C{\\{history} value when termination was
premature}\par
\Y\P$\4\X13:Global variables\X\mathrel{+}\S$\6
\4\\{deletions\_allowed}: \37\\{boolean};\C{is it safe for \\{error} to call %
\\{get\_token}?}\6
\4\\{history}: \37$\\{spotless}\to\\{fatal\_error\_stop}$;\C{has the source
input been clean so far?}\6
\4\\{error\_count}: \37$-1\to100$;\C{the number of scrolled errors since the
last paragraph ended}\par
\fi
\M77. The value of \\{history} is initially \\{fatal\_error\_stop}, but it will
be changed to \\{spotless} if \TeX\ survives the initialization process.
\Y\P$\4\X21:Set initial values of key variables\X\mathrel{+}\S$\6
$\\{deletions\_allowed}\K\\{true}$;\5
$\\{error\_count}\K0$;\C{\\{history} is initialized elsewhere}\par
\fi
\M78. Since errors can be detected almost anywhere in \TeX, we want to declare
the
error procedures near the beginning of the program. But the error procedures
in turn use some other procedures, which need to be declared \\{forward}
before we get to \\{error} itself.
It is possible for \\{error} to be called recursively if some error arises
when \\{get\_token} is being used to delete a token, or if some fatal error
occurs while \TeX\ is trying to fix a non-fatal one. But such recursion
is never more than one level deep.
\Y\P$\4\X34\*:Error handling procedures\X\mathrel{+}\S$\6
\4\&{procedure}\1\ \37\\{normalize\_selector};\5
\\{forward};\5
\hbox{\2}\6
\4\&{procedure}\1\ \37\\{get\_token};\5
\\{forward};\5
\hbox{\2}\6
\4\&{procedure}\1\ \37\\{term\_input};\5
\\{forward};\5
\hbox{\2}\6
\4\&{procedure}\1\ \37\\{show\_context};\5
\\{forward};\5
\hbox{\2}\6
\4\&{procedure}\1\ \37\\{begin\_file\_reading};\5
\\{forward};\5
\hbox{\2}\6
\4\&{procedure}\1\ \37\\{open\_log\_file};\5
\\{forward};\5
\hbox{\2}\6
\4\&{procedure}\1\ \37\\{close\_files\_and\_terminate};\5
\\{forward};\5
\hbox{\2}\6
\4\&{procedure}\1\ \37\\{clear\_for\_error\_prompt};\5
\\{forward};\5
\hbox{\2}\6
\4\&{procedure}\1\ \37\\{give\_err\_help};\5
\\{forward};\5
\hbox{\2}\6
\hbox{\4\hskip-\fontdimen2\font}\ \&{debug} \37\ \&{procedure}\1\ \37\\{debug%
\_help};\5
\\{forward};\ \&{gubed} \par
\fi
\M79. Individual lines of help are recorded in the array \\{help\_line}, which
contains entries in positions $0\to(\\{help\_ptr}-1)$. They should be printed
in reverse order, i.e., with $\\{help\_line}[0]$ last.
\Y\P\D \37$\\{hlp1}(\#)\S\\{help\_line}[0]\K\#$;\ \&{end} \par
\P\D \37$\\{hlp2}(\#)\S\\{help\_line}[1]\K\#$;\5
\\{hlp1}\par
\P\D \37$\\{hlp3}(\#)\S\\{help\_line}[2]\K\#$;\5
\\{hlp2}\par
\P\D \37$\\{hlp4}(\#)\S\\{help\_line}[3]\K\#$;\5
\\{hlp3}\par
\P\D \37$\\{hlp5}(\#)\S\\{help\_line}[4]\K\#$;\5
\\{hlp4}\par
\P\D \37$\\{hlp6}(\#)\S\\{help\_line}[5]\K\#$;\5
\\{hlp5}\par
\P\D \37$\\{help0}\S\\{help\_ptr}\K0$\C{sometimes there might be no help}\par
\P\D \37$\\{help1}\S$\ \&{begin} \37$\\{help\_ptr}\K1$;\5
\\{hlp1}\C{use this with one help line}\par
\P\D \37$\\{help2}\S$\ \&{begin} \37$\\{help\_ptr}\K2$;\5
\\{hlp2}\C{use this with two help lines}\par
\P\D \37$\\{help3}\S$\ \&{begin} \37$\\{help\_ptr}\K3$;\5
\\{hlp3}\C{use this with three help lines}\par
\P\D \37$\\{help4}\S$\ \&{begin} \37$\\{help\_ptr}\K4$;\5
\\{hlp4}\C{use this with four help lines}\par
\P\D \37$\\{help5}\S$\ \&{begin} \37$\\{help\_ptr}\K5$;\5
\\{hlp5}\C{use this with five help lines}\par
\P\D \37$\\{help6}\S$\ \&{begin} \37$\\{help\_ptr}\K6$;\5
\\{hlp6}\C{use this with six help lines}\par
\Y\P$\4\X13:Global variables\X\mathrel{+}\S$\6
\4\\{help\_line}: \37\&{array} $[0\to5]$ \1\&{of}\5
\\{str\_number};\C{helps for the next \\{error}}\2\6
\4\\{help\_ptr}: \37$0\to6$;\C{the number of help lines present}\6
\4\\{use\_err\_help}: \37\\{boolean};\C{should the \\{err\_help} list be
shown?}\par
\fi
\M80. \P$\X21:Set initial values of key variables\X\mathrel{+}\S$\6
$\\{help\_ptr}\K0$;\5
$\\{use\_err\_help}\K\\{false}$;\par
\fi
\M81. The \\{jump\_out} procedure just cuts across all active procedure levels
and
goes to \\{end\_of\_TEX}. This is the only nonlocal \&{goto} statement in the
whole program. It is used when there is no recovery from a particular error.
Some \PASCAL\ compilers do not implement non-local \&{goto} statements.
In such cases the body of \\{jump\_out} should simply be
`\\{close\_files\_and\_terminate};\thinspace' followed by a call on some system
procedure that quietly terminates the program.
\Y\P$\4\X34\*:Error handling procedures\X\mathrel{+}\S$\6
\4\&{procedure}\1\ \37\\{jump\_out};\2\6
\&{begin} \37\&{goto} \37\\{end\_of\_TEX};\6
\&{end};\par
\fi
\M82. Here now is the general \\{error} routine.
\Y\P$\4\X34\*:Error handling procedures\X\mathrel{+}\S$\6
\4\&{procedure}\1\ \37\\{error};\C{completes the job of error reporting}\6
\4\&{label} \37$\\{continue},\39\\{exit}$;\6
\4\&{var} \37\|c: \37\\{ASCII\_code};\C{what the user types}\6
$\\{s1},\39\\{s2},\39\\{s3},\39\\{s4}$: \37\\{integer};\C{used to save global
variables when deleting tokens}\2\6
\&{begin} \37\&{if} $\\{history}<\\{error\_message\_issued}$ \1\&{then}\5
$\\{history}\K\\{error\_message\_issued}$;\2\6
$\\{print\_char}(\.{"."})$;\5
\\{show\_context};\6
\&{if} $\\{interaction}=\\{error\_stop\_mode}$ \1\&{then}\5
\X83:Get user's advice and \&{return}\X;\2\6
$\\{incr}(\\{error\_count})$;\6
\&{if} $\\{error\_count}=100$ \1\&{then}\6
\&{begin} \37$\\{print\_nl}(\.{"(That\ makes\ 100\ errors;\ please\ try\
again.)"})$;\5
$\\{history}\K\\{fatal\_error\_stop}$;\5
\\{jump\_out};\6
\&{end};\2\6
\X90:Put help message on the transcript file\X;\6
\4\\{exit}: \37\&{end};\par
\fi
\M83. \P$\X83:Get user's advice and \&{return}\X\S$\6
\~ \1\&{loop}\ \&{begin} \37\\{continue}: \37\\{clear\_for\_error\_prompt};\5
$\\{prompt\_input}(\.{"?\ "})$;\6
\&{if} $\\{last}=\\{first}$ \1\&{then}\5
\&{return};\2\6
$\|c\K\\{buffer}[\\{first}]$;\6
\&{if} $\|c\G\.{"a"}$ \1\&{then}\5
$\|c\K\|c+\.{"A"}-\.{"a"}$;\C{convert to uppercase}\2\6
\X84\*:Interpret code \|c and \&{return} if done\X;\6
\&{end}\2\par
\U section~82.\fi
\M84\*. It is desirable to provide an `\.E' option here that gives the user
an easy way to return from \TeX\ to the system editor, with the offending
line ready to be edited. The present implementation does this by loading
the line editor with the appropriate call to the editor. We treat `\.T' the
same as `\.E', because other programs on this system invoke the editor
when the user says `\.T'.
There is a secret `\.D' option available when the debugging routines have
not been commented out.
\Y\P$\4\X84\*:Interpret code \|c and \&{return} if done\X\S$\6
\&{case} $\|c$ \1\&{of}\6
\4$\.{"0"},\39\.{"1"},\39\.{"2"},\39\.{"3"},\39\.{"4"},\39\.{"5"},\39\.{"6"},%
\39\.{"7"},\39\.{"8"},\39\.{"9"}$: \37\&{if} $\\{deletions\_allowed}$ \1%
\&{then}\5
\X88:Delete $\|c-\.{"0"}$ tokens and \&{goto} \\{continue}\X;\2\6
\hbox{\4\4}\ \&{debug} \37\.{"D"}: \37\&{begin} \37\\{debug\_help};\5
\&{goto} \37\\{continue};\ \&{end};\ \&{gubed}\6
\4$\.{"E"},\39\.{"T"}$: \37\&{if} $\\{base\_ptr}>0$ \1\&{then}\6
\&{begin} \37$\\{selector}\K\\{new\_string}$;\5
$\\{pool\_ptr}\K\\{str\_start}[\\{str\_ptr}]$;\5
$\\{print}(\.{"et\ "})$;\5
$\\{print}(\\{input\_stack}[\\{base\_ptr}].\\{name\_field})$;\5
$\\{print\_char}(\.{"/"})$;\5
$\\{print\_int}(\\{page})$;\5
$\\{print}(\.{"p/"})$;\5
$\\{print\_int}(\\{line})$;\5
$\\{print\_char}(\.{"l"})$;\5
$\\{print\_char}(\\{carriage\_return})$;\6
\&{if} $\\{str\_ptr}<\\{max\_strings}$ \1\&{then}\6
\&{begin} \37$\\{pseudo\_typein}\K\\{str\_ptr}$;\5
$\\{incr}(\\{str\_ptr})$;\5
$\\{str\_start}[\\{str\_ptr}]\K\\{pool\_ptr}$;\6
\&{end};\C{\\{make\_string} not declared \\{forward}}\2\6
$\\{selector}\K\\{term\_and\_log}$;\5
$\\{interaction}\K\\{scroll\_mode}$;\5
\\{jump\_out};\6
\&{end};\2\6
\4\.{"H"}: \37\X89:Print the help information and \&{goto} \\{continue}\X;\6
\4\.{"I"}: \37\X87:Introduce new material from the terminal and \&{return}\X;\6
\4$\.{"Q"},\39\.{"R"},\39\.{"S"}$: \37\X86:Change the interaction level and %
\&{return}\X;\6
\4\.{"X"}: \37\&{begin} \37$\\{interaction}\K\\{scroll\_mode}$;\5
\\{jump\_out};\6
\&{end};\6
\4\&{othercases} \37\\{do\_nothing}\2\6
\&{endcases};\6
\X85:Print the menu of available options\X\par
\U section~83.\fi
\M85. \P$\X85:Print the menu of available options\X\S$\6
\&{begin} \37$\\{print}(\.{"Type\ <return>\ to\ proceed,\ S\ to\ scroll\ future%
\ error\ messages,"})$;\6
$\\{print\_nl}(\.{"R\ to\ run\ without\ stopping,\ Q\ to\ run\ quietly,"})$;\6
$\\{print\_nl}(\.{"I\ to\ insert\ something,\ "})$;\6
\&{if} $\\{base\_ptr}>0$ \1\&{then}\5
$\\{print}(\.{"E\ to\ edit\ your\ file,"})$;\2\6
\&{if} $\\{deletions\_allowed}$ \1\&{then}\5
$\\{print\_nl}(\.{"1\ or\ ...\ or\ 9\ to\ ignore\ the\ next\ 1\ to\ 9\ tokens\
of\ input,"})$;\2\6
$\\{print\_nl}(\.{"H\ for\ help,\ X\ to\ quit."})$;\6
\&{end}\par
\U section~84\*.\fi
\M86. Here the author of \TeX\ apologizes for making use of the numerical
relation between \.{"Q"}, \.{"R"}, \.{"S"}, and the desired interaction
settings
\\{batch\_mode}, \\{nonstop\_mode}, \\{scroll\_mode}.
\Y\P$\4\X86:Change the interaction level and \&{return}\X\S$\6
\&{begin} \37$\\{error\_count}\K0$;\5
$\\{interaction}\K\\{batch\_mode}+\|c-\.{"Q"}$;\5
$\\{print}(\.{"OK,\ entering\ "})$;\6
\&{case} $\|c$ \1\&{of}\6
\4\.{"Q"}: \37\&{begin} \37$\\{print\_esc}(\.{"batchmode"})$;\5
$\\{decr}(\\{selector})$;\6
\&{end};\6
\4\.{"R"}: \37$\\{print\_esc}(\.{"nonstopmode"})$;\6
\4\.{"S"}: \37$\\{print\_esc}(\.{"scrollmode"})$;\2\6
\&{end};\C{there are no other cases}\6
$\\{print}(\.{"..."})$;\5
\\{print\_ln};\5
\\{update\_terminal};\5
\&{return};\6
\&{end}\par
\U section~84\*.\fi
\M87. When the following code is executed, $\\{buffer}[(\\{first}+1)\to(%
\\{last}-1)]$ may
contain the material inserted by the user; otherwise another prompt will
be given. In order to understand this part of the program fully, you need
to be familiar with \TeX's input stacks.
\Y\P$\4\X87:Introduce new material from the terminal and \&{return}\X\S$\6
\&{begin} \37\\{begin\_file\_reading};\C{enter a new syntactic level for
terminal input}\6
\C{now $\\{state}=\\{mid\_line}$, so an initial blank space will count as a
blank}\6
\&{if} $\\{last}>\\{first}+1$ \1\&{then}\6
\&{begin} \37$\\{loc}\K\\{first}+1$;\5
$\\{buffer}[\\{first}]\K\.{"\ "}$;\6
\&{end}\6
\4\&{else} \&{begin} \37$\\{prompt\_input}(\.{"insert>"})$;\5
$\\{loc}\K\\{first}$;\6
\&{end};\2\6
$\\{first}\K\\{last}$;\5
$\\{cur\_input}.\\{limit\_field}\K\\{last}-1$;\C{no \\{end\_line\_char} ends
this line}\6
\&{return};\6
\&{end}\par
\U section~84\*.\fi
\M88. We allow deletion of up to 99 tokens at a time.
\Y\P$\4\X88:Delete $\|c-\.{"0"}$ tokens and \&{goto} \\{continue}\X\S$\6
\&{begin} \37$\\{s1}\K\\{cur\_tok}$;\5
$\\{s2}\K\\{cur\_cmd}$;\5
$\\{s3}\K\\{cur\_chr}$;\5
$\\{s4}\K\\{align\_state}$;\5
$\\{align\_state}\K1000000$;\5
$\\{OK\_to\_interrupt}\K\\{false}$;\6
\&{if} $(\\{last}>\\{first}+1)\W(\\{buffer}[\\{first}+1]\G\.{"0"})\W(%
\\{buffer}[\\{first}+1]\L\.{"9"})$ \1\&{then}\5
$\|c\K\|c\ast10+\\{buffer}[\\{first}+1]-\.{"0"}\ast11$\6
\4\&{else} $\|c\K\|c-\.{"0"}$;\2\6
\&{while} $\|c>0$ \1\&{do}\6
\&{begin} \37\\{get\_token};\C{one-level recursive call of \\{error} is
possible}\6
$\\{decr}(\|c)$;\6
\&{end};\2\6
$\\{cur\_tok}\K\\{s1}$;\5
$\\{cur\_cmd}\K\\{s2}$;\5
$\\{cur\_chr}\K\\{s3}$;\5
$\\{align\_state}\K\\{s4}$;\5
$\\{OK\_to\_interrupt}\K\\{true}$;\5
$\\{help2}(\.{"I\ have\ just\ deleted\ some\ text,\ as\ you\ asked."})$\6
$(\.{"You\ can\ now\ delete\ more,\ or\ insert,\ or\ whatever."})$;\5
\\{show\_context};\5
\&{goto} \37\\{continue};\6
\&{end}\par
\U section~84\*.\fi
\M89. \P$\X89:Print the help information and \&{goto} \\{continue}\X\S$\6
\&{begin} \37\&{if} $\\{use\_err\_help}$ \1\&{then}\6
\&{begin} \37\\{give\_err\_help};\5
$\\{use\_err\_help}\K\\{false}$;\6
\&{end}\6
\4\&{else} \&{begin} \37\&{if} $\\{help\_ptr}=0$ \1\&{then}\5
$\\{help2}(\.{"Sorry,\ I\ don\'t\ know\ how\ to\ help\ in\ this\ situation."})$%
\2\6
$\hbox{\kern1em}(\.{"Maybe\ you\ should\ try\ asking\ a\ human?"})$;\6
\1\&{repeat} \37$\\{decr}(\\{help\_ptr})$;\5
$\\{print}(\\{help\_line}[\\{help\_ptr}])$;\5
\\{print\_ln};\6
\4\&{until}\5
$\\{help\_ptr}=0$;\2\6
\&{end};\2\6
$\\{help4}(\.{"Sorry,\ I\ already\ gave\ what\ help\ I\ could..."})$\6
$(\.{"Maybe\ you\ should\ try\ asking\ a\ human?"})$\6
$(\.{"An\ error\ might\ have\ occurred\ before\ I\ noticed\ any\ problems."})$\6
$(\.{"\`\`If\ all\ else\ fails,\ read\ the\ instructions.\'\'"})$;\6
\&{goto} \37\\{continue};\6
\&{end}\par
\U section~84\*.\fi
\M90. \P$\X90:Put help message on the transcript file\X\S$\6
\&{if} $\\{interaction}>\\{batch\_mode}$ \1\&{then}\5
$\\{decr}(\\{selector})$;\C{avoid terminal output}\2\6
\&{if} $\\{use\_err\_help}$ \1\&{then}\6
\&{begin} \37\\{print\_ln};\5
\\{give\_err\_help};\6
\&{end}\6
\4\&{else} \&{while} $\\{help\_ptr}>0$ \1\&{do}\6
\&{begin} \37$\\{decr}(\\{help\_ptr})$;\5
$\\{print\_nl}(\\{help\_line}[\\{help\_ptr}])$;\6
\&{end};\2\2\6
\\{print\_ln};\6
\&{if} $\\{interaction}>\\{batch\_mode}$ \1\&{then}\5
$\\{incr}(\\{selector})$;\C{re-enable terminal output}\2\6
\\{print\_ln}\par
\U section~82.\fi
\M91. A dozen or so error messages end with a parenthesized integer, so we
save a teeny bit of program space by declaring the following procedure:
\Y\P\4\&{procedure}\1\ \37$\\{int\_error}(\|n:\\{integer})$;\2\6
\&{begin} \37$\\{print}(\.{"\ ("})$;\5
$\\{print\_int}(\|n)$;\5
$\\{print\_char}(\.{")"})$;\5
\\{error};\6
\&{end};\par
\fi
\M92. In anomalous cases, the print selector might be in an unknown state;
the following subroutine is called to fix things just enough to keep
running a bit longer.
The normal idea of \\{batch\_mode} is that nothing at all should be written
on the terminal. However, in the unusual case that a fatal error has
occurred but no log file could be opened, we make an exception and allow
an explanatory message to be seen.
\Y\P\4\&{procedure}\1\ \37\\{normalize\_selector};\2\6
\&{begin} \37\&{if} $\\{job\_name}>0$ \1\&{then}\5
$\\{selector}\K\\{term\_and\_log}$\6
\4\&{else} $\\{selector}\K\\{term\_only}$;\2\6
\&{if} $\\{job\_name}=0$ \1\&{then}\5
\\{open\_log\_file};\2\6
\&{if} $\\{interaction}=\\{batch\_mode}$ \1\&{then}\5
$\\{decr}(\\{selector})$;\2\6
\&{end};\par
\fi
\M93. The following procedure prints \TeX's last words before dying.
\Y\P\D \37$\\{succumb}\S$\1\6
\&{begin} \37\&{if} $\\{interaction}=\\{error\_stop\_mode}$ \1\&{then}\5
$\\{interaction}\K\\{scroll\_mode}$;\C{no more interaction}\2\6
\\{error};\6
\&{debug} \37\&{if} $\\{interaction}>\\{batch\_mode}$ \1\&{then}\5
\\{debug\_help};\ \2\6
\&{gubed}\6
$\\{history}\K\\{fatal\_error\_stop}$;\5
\\{jump\_out};\C{irrecoverable error}\6
\&{end}\2\par
\Y\P$\4\X34\*:Error handling procedures\X\mathrel{+}\S$\6
\4\&{procedure}\1\ \37$\\{fatal\_error}(\|s:\\{str\_number})$;\C{prints \|s,
and that's it}\2\6
\&{begin} \37\\{normalize\_selector};\6
$\\{print\_err}(\.{"Emergency\ stop"})$;\5
$\\{help1}(\|s)$;\5
\\{succumb};\6
\&{end};\par
\fi
\M94. Here is the most dreaded error message.
\Y\P$\4\X34\*:Error handling procedures\X\mathrel{+}\S$\6
\4\&{procedure}\1\ \37$\\{overflow}(\|s:\\{str\_number};\,\35\|n:%
\\{integer})$;\C{stop due to finiteness}\2\6
\&{begin} \37\\{normalize\_selector};\5
$\\{print\_err}(\.{"TeX\ capacity\ exceeded,\ sorry\ ["})$;\5
$\\{print}(\|s)$;\5
$\\{print\_char}(\.{"="})$;\5
$\\{print\_int}(\|n)$;\5
$\\{print\_char}(\.{"]"})$;\5
$\\{help2}(\.{"If\ you\ really\ absolutely\ need\ more\ capacity,"})$\6
$(\.{"you\ can\ ask\ a\ wizard\ to\ enlarge\ me."})$;\5
\\{succumb};\6
\&{end};\par
\fi
\M95. The program might sometime run completely amok, at which point there is
no choice but to stop. If no previous error has been detected, that's bad
news; a message is printed that is really intended for the \TeX\
maintenance person instead of the user (unless the user has been
particularly diabolical). The index entries for `this can't happen' may
help to pinpoint the problem.
\Y\P$\4\X34\*:Error handling procedures\X\mathrel{+}\S$\6
\4\&{procedure}\1\ \37$\\{confusion}(\|s:\\{str\_number})$;\C{consistency
check violated; \|s tells where}\2\6
\&{begin} \37\\{normalize\_selector};\6
\&{if} $\\{history}<\\{error\_message\_issued}$ \1\&{then}\6
\&{begin} \37$\\{print\_err}(\.{"This\ can\'t\ happen\ ("})$;\5
$\\{print}(\|s)$;\5
$\\{print\_char}(\.{")"})$;\5
$\\{help1}(\.{"I\'m\ broken.\ Please\ show\ this\ to\ someone\ who\ can\ fix\
can\ fix"})$;\6
\&{end}\6
\4\&{else} \&{begin} \37$\\{print\_err}(\.{"I\ can\'t\ go\ on\ meeting\ you\
like\ this"})$;\5
$\\{help2}(\.{"One\ of\ your\ faux\ pas\ seems\ to\ have\ wounded\ me\
deeply..."})$\6
$(\.{"in\ fact,\ I\'m\ barely\ conscious.\ Please\ fix\ it\ and\ try\
again."})$;\6
\&{end};\2\6
\\{succumb};\6
\&{end};\par
\fi
\M96. Users occasionally want to interrupt \TeX\ while it's running.
If the \PASCAL\ runtime system allows this, one can implement
a routine that sets the global variable \\{interrupt} to some nonzero value
when such an interrupt is signalled. Otherwise there is probably at least
a way to make \\{interrupt} nonzero using the \PASCAL\ debugger.
\Y\P\D \37$\\{check\_interrupt}\S$\1\6
\&{begin} \37\&{if} $\\{interrupt}\I0$ \1\&{then}\5
\\{pause\_for\_instructions};\2\6
\&{end}\2\par
\Y\P$\4\X13:Global variables\X\mathrel{+}\S$\6
\4\\{interrupt}: \37\\{integer};\C{should \TeX\ pause for instructions?}\6
\4\\{OK\_to\_interrupt}: \37\\{boolean};\C{should interrupts be observed?}\par
\fi
\M97. \P$\X21:Set initial values of key variables\X\mathrel{+}\S$\6
$\\{interrupt}\K0$;\5
$\\{OK\_to\_interrupt}\K\\{true}$;\par
\fi
\M98. When an interrupt has been detected, the program goes into its
highest interaction level and lets the user have the full flexibility of
the \\{error} routine. \TeX\ checks for interrupts only at times when it is
safe to do this.
\Y\P\4\&{procedure}\1\ \37\\{pause\_for\_instructions};\2\6
\&{begin} \37\&{if} $\\{OK\_to\_interrupt}$ \1\&{then}\6
\&{begin} \37$\\{interaction}\K\\{error\_stop\_mode}$;\6
\&{if} $(\\{selector}=\\{log\_only})\V(\\{selector}=\\{no\_print})$ \1\&{then}\5
$\\{incr}(\\{selector})$;\2\6
$\\{print\_err}(\.{"Interruption"})$;\5
$\\{help3}(\.{"You\ rang?"})$\6
$(\.{"Try\ to\ insert\ some\ instructions\ for\ me\ (e.g.,\`I\\showlists%
\'),"})$\6
$(\.{"unless\ you\ just\ want\ to\ quit\ by\ typing\ \`X\'."})$;\5
$\\{deletions\_allowed}\K\\{false}$;\5
\\{error};\5
$\\{deletions\_allowed}\K\\{true}$;\5
$\\{interrupt}\K0$;\6
\&{end};\2\6
\&{end};\par
\fi
\N99. \[7] Arithmetic with scaled dimensions.
The principal computations performed by \TeX\ are done entirely in terms of
integers less than $2↑{31}$ in magnitude; and divisions are done only when both
dividend and divisor are nonnegative. Thus, the arithmetic specified in this
program can be carried out in exactly the same way on a wide variety of
computers, including some small ones. Why? Because the arithmetic
calculations need to be spelled out precisely in order to guarantee that
\TeX\ will produce identical output on different machines. If some
quantities were rounded differently in different implementations, we would
find that line breaks and even page breaks might occur in different places.
Hence the arithmetic of \TeX\ has been designed with care, and systems that
claim to be implementations of \TeX82 should follow precisely the
calculations as they appear in the present program.
(Actually there are three places where \TeX\ uses $\mathbin{\&{div}}$ with a
possibly negative
numerator. These are harmless; see $\mathbin{\&{div}}$ in the index. Also if
the user
sets the \.{\\time} or the \.{\\year} to a negative value, some diagnostic
information will involve negative-numerator division. The same remarks
apply for $\mathbin{\&{mod}}$ as well as for $\mathbin{\&{div}}$.)
\fi
\M100. Here is a routine that calculates half of an integer, using an
unambiguous convention with respect to signed odd numbers.
\Y\P\4\&{function}\1\ \37$\\{half}(\|x:\\{integer})$: \37\\{integer};\2\6
\&{begin} \37\&{if} $\\{odd}(\|x)$ \1\&{then}\5
$\\{half}\K(\|x+1)\mathbin{\&{div}}2$\6
\4\&{else} $\\{half}\K\|x\mathbin{\&{div}}2$;\2\6
\&{end};\par
\fi
\M101. Fixed-point arithmetic is done on {\sl scaled integers\/} that are
multiples
of $2↑{-16}$. In other words, a binary point is assumed to be sixteen bit
positions from the right end of a binary computer word.
\Y\P\D \37$\\{unity}\S\O{200000}$\C{$2↑{16}$, represents 1.00000}\par
\P\D \37$\\{two}\S\O{400000}$\C{$2↑{17}$, represents 2.00000}\par
\Y\P$\4\X18:Types in the outer block\X\mathrel{+}\S$\6
$\\{scaled}=\\{integer}$;\C{this type is used for scaled integers}\6
$\\{nonnegative\_integer}=0\to\O{17777777777}$;\C{$0\L x<2↑{31}$}\6
$\\{small\_number}=0\to63$;\C{this type is self-explanatory}\par
\fi
\M102. The following function is used to create a scaled integer from a decimal
fraction $(.d_0d_1\ldots d_{k-1})$, where $0\L\|k\L17$. The digit $d_i$ is
given in $\\{dig}[\|i]$, and the calculation produces a correctly rounded
result.
\Y\P\4\&{function}\1\ \37$\\{round\_decimals}(\|k:\\{small\_number})$: \37%
\\{scaled};\C{converts a decimal fraction}\6
\4\&{var} \37\|a: \37\\{integer};\C{the accumulator}\2\6
\&{begin} \37$\|a\K0$;\6
\&{while} $\|k>0$ \1\&{do}\6
\&{begin} \37$\\{decr}(\|k)$;\5
$\|a\K(\|a+\\{dig}[\|k]\ast\\{two})\mathbin{\&{div}}10$;\6
\&{end};\2\6
$\\{round\_decimals}\K(\|a+1)\mathbin{\&{div}}2$;\6
\&{end};\par
\fi
\M103. Conversely, here is a procedure analogous to \\{print\_int}. If the
output
of this procedure is subsequently read by \TeX\ and converted by the
\\{round\_decimals} routine above, it turns out that the original value will
be reproduced exactly; the ``simplest'' such decimal number is output,
but there is always at least one digit following the decimal point.
The invariant relation in the \&{repeat} loop is that a sequence of
decimal digits yet to be printed will yield the original number if and only if
they form a fraction~$f$ in the range $s-\delta\L10\cdot2↑{16}f<s$.
We can stop if and only if $f=0$ satisfies this condition; the loop will
terminate before $s$ can possibly become zero.
\Y\P\4\&{procedure}\1\ \37$\\{print\_scaled}(\|s:\\{scaled})$;\C{prints scaled
real, rounded to five digits}\6
\4\&{var} \37\\{delta}: \37\\{scaled};\C{amount of allowable inaccuracy}\2\6
\&{begin} \37\&{if} $\|s<0$ \1\&{then}\6
\&{begin} \37$\\{print\_char}(\.{"-"})$;\5
$\\{negate}(\|s)$;\C{print the sign, if negative}\6
\&{end};\2\6
$\\{print\_int}(\|s\mathbin{\&{div}}\\{unity})$;\C{print the integer part}\6
$\\{print\_char}(\.{"."})$;\5
$\|s\K10\ast(\|s\mathbin{\&{mod}}\\{unity})+5$;\5
$\\{delta}\K10$;\6
\1\&{repeat} \37\&{if} $\\{delta}>\\{unity}$ \1\&{then}\5
$\|s\K\|s+\O{100000}-(\\{delta}\mathbin{\&{div}}2)$;\C{round the last digit}\2\6
$\\{print\_char}(\.{"0"}+(\|s\mathbin{\&{div}}\\{unity}))$;\5
$\|s\K10\ast(\|s\mathbin{\&{mod}}\\{unity})$;\5
$\\{delta}\K\\{delta}\ast10$;\6
\4\&{until}\5
$\|s\L\\{delta}$;\2\6
\&{end};\par
\fi
\M104. Physical sizes that a \TeX\ user specifies for portions of documents are
represented internally as scaled points. Thus, if we define an `sp' (scaled
point) as a unit equal to $2↑{-16}$ printer's points, every dimension
inside of \TeX\ is an integer number of sp. There are exactly
4,736,286.72 sp per inch. Users are not allowed to specify dimensions
larger than $2↑{30}-1$ sp, which is a distance of about 18.892 feet (5.7583
meters); two such quantities can be added without overflow on a 32-bit
computer.
The present implementation of \TeX\ does not check for overflow when
dimensions are added or subtracted. This could be done by inserting a
few dozen tests of the form `\ignorespaces \&{if} $\|x\G\O{10000000000}$ %
\&{then} \hbox{\\{report\_overflow}}', but the chance of overflow is so remote
that
such tests do not seem worthwhile.
\TeX\ needs to do only a few arithmetic operations on scaled quantities,
other than addition and subtraction, and the following subroutines do most of
the work. A single computation might use several subroutine calls, and it is
desirable to avoid producing multiple error messages in case of arithmetic
overflow; so the routines set the global variable \\{arith\_error} to \\{true}
instead of reporting errors directly to the user. Another global variable,
\\{remainder}, holds the remainder after a division.
\Y\P$\4\X13:Global variables\X\mathrel{+}\S$\6
\4\\{arith\_error}: \37\\{boolean};\C{has arithmetic overflow occurred
recently?}\6
\4\\{remainder}: \37\\{scaled};\C{amount subtracted to get an exact division}%
\par
\fi
\M105. The first arithmetical subroutine we need computes $nx+y$, where \|x
and~\|y are \\{scaled} and \|n is an integer.
\Y\P\4\&{function}\1\ \37$\\{nx\_plus\_y}(\|n:\\{integer};\,\35\|x,\39\|y:%
\\{scaled})$: \37\\{scaled};\2\6
\&{begin} \37\&{if} $\|n<0$ \1\&{then}\6
\&{begin} \37$\\{negate}(\|x)$;\5
$\\{negate}(\|n)$;\6
\&{end};\2\6
\&{if} $\|n=0$ \1\&{then}\5
$\\{nx\_plus\_y}\K\|y$\6
\4\&{else} \&{if} $((\|x\L(\O{7777777777}-\|y)\mathbin{\&{div}}\|n)\W(-\|x\L(%
\O{7777777777}+\|y)\mathbin{\&{div}}\|n))$ \1\&{then}\5
$\\{nx\_plus\_y}\K\|n\ast\|x+\|y$\6
\4\&{else} \&{begin} \37$\\{arith\_error}\K\\{true}$;\5
$\\{nx\_plus\_y}\K0$;\6
\&{end};\2\2\6
\&{end};\par
\fi
\M106. We also need to divide scaled dimensions by integers.
\Y\P\4\&{function}\1\ \37$\\{x\_over\_n}(\|x:\\{scaled};\,\35\|n:%
\\{integer})$: \37\\{scaled};\6
\4\&{var} \37\\{negative}: \37\\{boolean};\C{should \\{remainder} be negated?}%
\2\6
\&{begin} \37$\\{negative}\K\\{false}$;\6
\&{if} $\|n=0$ \1\&{then}\6
\&{begin} \37$\\{arith\_error}\K\\{true}$;\5
$\\{x\_over\_n}\K0$;\5
$\\{remainder}\K\|x$;\6
\&{end}\6
\4\&{else} \&{begin} \37\&{if} $\|n<0$ \1\&{then}\6
\&{begin} \37$\\{negate}(\|x)$;\5
$\\{negate}(\|n)$;\5
$\\{negative}\K\\{true}$;\6
\&{end};\2\6
\&{if} $\|x\G0$ \1\&{then}\6
\&{begin} \37$\\{x\_over\_n}\K\|x\mathbin{\&{div}}\|n$;\5
$\\{remainder}\K\|x\mathbin{\&{mod}}\|n$;\6
\&{end}\6
\4\&{else} \&{begin} \37$\\{x\_over\_n}\K-((-\|x)\mathbin{\&{div}}\|n)$;\5
$\\{remainder}\K-((-\|x)\mathbin{\&{mod}}\|n)$;\6
\&{end};\2\6
\&{end};\2\6
\&{if} $\\{negative}$ \1\&{then}\5
$\\{negate}(\\{remainder})$;\2\6
\&{end};\par
\fi
\M107. Then comes the multiplication of a scaled number by a fraction $\|n/%
\|d$,
where \|n and \|d are nonnegative integers $\L\hbox{$2↑{16}$}$ and \|d is
positive. It would be too dangerous to multiply by~\|n and then divide
by~\|d, in separate operations, since overflow might well occur; and it
would be too inaccurate to divide by \|d and then multiply by \|n. Hence
this subroutine simulates 1.5-precision arithmetic.
\Y\P\4\&{function}\1\ \37$\\{xn\_over\_d}(\|x:\\{scaled};\,\35\|n,\39\|d:%
\\{integer})$: \37\\{scaled};\6
\4\&{var} \37\\{positive}: \37\\{boolean};\C{was $\|x\G0$?}\6
$\|t,\39\|u,\39\|v$: \37\\{nonnegative\_integer};\C{intermediate quantities}\2\6
\&{begin} \37\&{if} $\|x\G0$ \1\&{then}\5
$\\{positive}\K\\{true}$\6
\4\&{else} \&{begin} \37$\\{negate}(\|x)$;\5
$\\{positive}\K\\{false}$;\6
\&{end};\2\6
$\|t\K(\|x\mathbin{\&{mod}}\O{100000})\ast\|n$;\5
$\|u\K(\|x\mathbin{\&{div}}\O{100000})\ast\|n+(\|t\mathbin{\&{div}}%
\O{100000})$;\5
$\|v\K(\|u\mathbin{\&{mod}}\|d)\ast\O{100000}+(\|t\mathbin{\&{mod}}%
\O{100000})$;\6
\&{if} $\|u\mathbin{\&{div}}\|d\G\O{100000}$ \1\&{then}\5
$\\{arith\_error}\K\\{true}$\6
\4\&{else} $\|u\K\O{100000}\ast(\|u\mathbin{\&{div}}\|d)+(\|v\mathbin{\&{div}}%
\|d)$;\2\6
\&{if} $\\{positive}$ \1\&{then}\6
\&{begin} \37$\\{xn\_over\_d}\K\|u$;\5
$\\{remainder}\K\|v\mathbin{\&{mod}}\|d$;\6
\&{end}\6
\4\&{else} \&{begin} \37$\\{xn\_over\_d}\K-\|u$;\5
$\\{remainder}\K-(\|v\mathbin{\&{mod}}\|d)$;\6
\&{end};\2\6
\&{end};\par
\fi
\M108. The next subroutine is used to compute the ``badness'' of glue, when a
total~\|t is supposed to be made from amounts that sum to~\|s. According
to {\sl The \TeX book}, the badness of this situation is $100(t/s)↑3$;
however, badness is simply a heuristic, so we need not squeeze out the
last drop of accuracy when computing it. All we really want is an
approximation that has similar properties.
The actual method used to compute the badness is easier to read from the
program than to describe in words. It produces an integer value that is a
reasonably close approximation to $100(t/s)↑3$, and all implementations
of \TeX\ should use precisely this method. Any badness of $2↑{13}$ or more is
treated as infinitely bad, and represented by 10000.
It is not difficult to prove that $$\hbox{$\\{badness}(\|t+1,\|s)\G\\{badness}(%
\|t,\|s)\G\\{badness}(\|t,\|s+1)$}.$$ The badness function defined here is
capable of
computing at most 1095 distinct values, but that is plenty.
\Y\P\D \37$\\{inf\_bad}=10000$\C{infinitely bad value}\par
\Y\P\4\&{function}\1\ \37$\\{badness}(\|t,\39\|s:\\{scaled})$: \37%
\\{halfword};\C{compute badness, given $\|t\G0$}\6
\4\&{var} \37\|r: \37\\{integer};\C{approximation to $\alpha t/s$, where $%
\alpha↑3\approx 100\cdot2↑{18}$}\2\6
\&{begin} \37\&{if} $\|t=0$ \1\&{then}\5
$\\{badness}\K0$\6
\4\&{else} \&{if} $\|s\L0$ \1\&{then}\5
$\\{badness}\K\\{inf\_bad}$\6
\4\&{else} \&{begin} \37\&{if} $\|t\L7230584$ \1\&{then}\5
$\|r\K(\|t\ast297)\mathbin{\&{div}}\|s$\C{$297↑3=99.94\times2↑{18}$}\6
\4\&{else} \&{if} $\|s\G1663497$ \1\&{then}\5
$\|r\K\|t\mathbin{\&{div}}(\|s\mathbin{\&{div}}297)$\6
\4\&{else} $\|r\K\|t$;\2\2\6
\&{if} $\|r>1290$ \1\&{then}\5
$\\{badness}\K\\{inf\_bad}$\C{$1290↑3<2↑{31}<1291↑3$}\6
\4\&{else} $\\{badness}\K(\|r\ast\|r\ast\|r+\O{400000})\mathbin{\&{div}}%
\O{1000000}$;\2\6
\&{end};\C{that was $r↑3/2↑{18}$, rounded to the nearest integer}\2\2\6
\&{end};\par
\fi
\M109. When \TeX\ ``packages'' a list into a box, it needs to calculate the
proportionality ratio by which the glue inside the box should stretch
or shrink. This calculation does not affect \TeX's decision making,
so the precise details of rounding, etc., in the glue calculation are not
of critical importance for the consistency of results on different computers.
We shall use the type \\{glue\_ratio} for such proportionality ratios.
A glue ratio should take the same amount of memory as an
\\{integer} (usually 32 bits) if it is to blend smoothly with \TeX's
other data structures. Thus \\{glue\_ratio} should be equivalent to
\\{short\_real} in some implementations of \PASCAL. Alternatively,
it is possible to deal with glue ratios using nothing but fixed-point
arithmetic; see {\sl TUGboat \bf3},1 (February 1982), 10--27. (But the
routines cited there must be modified to allow negative glue ratios.)
\Y\P\D \37$\\{set\_glue\_ratio\_zero}(\#)\S\#\K0.0$\C{store the representation
of zero ratio}\par
\P\D \37$\\{set\_glue\_ratio\_one}(\#)\S\#\K1.0$\C{store the representation of
unit ratio}\par
\P\D \37$\\{float}(\#)\S\#$\C{convert from \\{glue\_ratio} to type \\{real}}\par
\P\D \37$\\{unfloat}(\#)\S\#$\C{convert from \\{real} to type \\{glue\_ratio}}%
\par
\P\D \37$\\{float\_constant}(\#)\S\#.0$\C{convert \\{integer} constant to %
\\{real}}\par
\Y\P$\4\X18:Types in the outer block\X\mathrel{+}\S$\6
$\\{glue\_ratio}=\\{real}$;\C{one-word representation of a glue expansion
factor}\par
\fi
\N110\*. \[8] Packed data.
In order to make efficient use of storage space, \TeX\ bases its major data
structures on a \\{memory\_word}, which contains either a (signed) integer,
possibly scaled, or an (unsigned) \\{glue\_ratio}, or a small number of
fields that are one half or one quarter of the size used for storing
integers.
If \|x is a variable of type \\{memory\_word}, it contains up to four
fields that can be referred to as follows:
$$\vbox{\halign{\hfil#\hfil\hfil\cr
\|x&.\\{int}&(an \\{integer})\cr
\|x&.\\{sc}\qquad&(a \\{scaled} integer)\cr
\|x&.\\{gr}&(a \\{glue\_ratio})\cr
\|x.\\{hh}.\\{lh}, \|x.\\{hh}&.\\{rh}&(two halfword fields)\cr
\|x.\\{hh}.\\{b0}, \|x.\\{hh}.\\{b1}, \|x.\\{hh}&.\\{rh}&(two quarterword
fields, one halfword
field)\cr
\|x.\\{qqqq}.\\{b0}, \|x.\\{qqqq}.\\{b1}, \|x.\\{qqqq}&.\\{b2}, \|x.\\{qqqq}.%
\\{b3}\hskip-100pt
&\qquad\qquad\qquad(four quarterword fields)\cr}}$$
This is somewhat cumbersome to write, and not very readable either, but
macros will be used to make the notation shorter and more transparent.
The \PASCAL\ code below gives a formal definition of \\{memory\_word} and
its subsidiary types, using packed variant records. \TeX\ makes no
assumptions about the relative positions of the fields within a word.
Since we are assuming 32-bit integers, a halfword must contain at least
16 bits, and a quarterword must contain at least 8 bits.
But it doesn't hurt to have more bits; for example, with enough 36-bit
words you might be able to have \\{mem\_max} as large as 262142, which is
eight times as much memory as anybody had during the first four years of
\TeX's existence.
N.B.: Valuable memory space will be dreadfully wasted unless \TeX\ is compiled
by a \PASCAL\ that packs all of the \\{memory\_word} variants into
the space of a single integer. This means, for example, that \\{glue\_ratio}
words should be \\{short\_real} instead of \\{real} on some computers. Some
\PASCAL\ compilers will pack an integer whose subrange is `$0\to255$' into
an eight-bit field, but others insist on allocating space for an additional
sign bit; on such systems you can get 256 values into a quarterword only
if the subrange is `$-128\to127$'.
The present implementation tries to accommodate as many variations as possible,
so it makes rather general assumptions. If integers having the subrange
`$\\{min\_quarterword}\to\\{max\_quarterword}$' can be packed into a
quarterword,
and if integers having the subrange `$\\{min\_halfword}\to\\{max\_halfword}$'
can be packed into a halfword, everything should work satisfactorily.
It is usually most efficient to have $\\{min\_quarterword}=\\{min%
\_halfword}=0$,
so one should try to achieve this unless it causes a severe problem.
The values defined here are recommended for most 32-bit computers,
except \\{max\_halfword} is strange (just to make the test hairier).
\Y\P\D \37$\\{min\_quarterword}=0$\C{smallest allowable value in a %
\\{quarterword}}\par
\P\D \37$\\{max\_quarterword}=255$\C{largest allowable value in a %
\\{quarterword}}\par
\P\D \37$\\{min\_halfword}\S0$\C{smallest allowable value in a \\{halfword}}\par
\P\D \37$\\{max\_halfword}\S32768$\C{largest allowable value in a \\{halfword}}%
\par
\fi
\M111. Here are the inequalities that the quarterword and halfword values
must satisfy (or rather, the inequalities that they mustn't satisfy):
\Y\P$\4\X14:Check the ``constant'' values for consistency\X\mathrel{+}\S$\6
\&{init} \37\&{if} $(\\{mem\_min}\I\\{mem\_bot})\V(\\{mem\_max}\I\\{mem\_top})$
\1\&{then}\5
$\\{bad}\K10$;\ \2\6
\&{tini}\6
\&{if} $(\\{mem\_min}>\\{mem\_bot})\V(\\{mem\_max}<\\{mem\_top})$ \1\&{then}\5
$\\{bad}\K10$;\2\6
\&{if} $(\\{min\_quarterword}>0)\V(\\{max\_quarterword}<127)$ \1\&{then}\5
$\\{bad}\K11$;\2\6
\&{if} $(\\{min\_halfword}>0)\V(\\{max\_halfword}<32767)$ \1\&{then}\5
$\\{bad}\K12$;\2\6
\&{if} $(\\{min\_quarterword}<\\{min\_halfword})\V\30(\\{max\_quarterword}>%
\\{max\_halfword})$ \1\&{then}\5
$\\{bad}\K13$;\2\6
\&{if} $(\\{mem\_min}<\\{min\_halfword})\V(\\{mem\_max}\G\\{max\_halfword})\V%
\30(\\{mem\_bot}-\\{mem\_min}>\\{max\_halfword}+1)$ \1\&{then}\5
$\\{bad}\K14$;\2\6
\&{if} $(\\{font\_base}<\\{min\_quarterword})\V(\\{font\_max}>\\{max%
\_quarterword})$ \1\&{then}\5
$\\{bad}\K15$;\2\6
\&{if} $\\{font\_max}>\\{font\_base}+256$ \1\&{then}\5
$\\{bad}\K16$;\2\6
\&{if} $(\\{save\_size}>\\{max\_halfword})\V(\\{max\_strings}>\\{max%
\_halfword})$ \1\&{then}\5
$\\{bad}\K17$;\2\6
\&{if} $\\{buf\_size}>\\{max\_halfword}$ \1\&{then}\5
$\\{bad}\K18$;\2\6
\&{if} $\\{max\_quarterword}-\\{min\_quarterword}<255$ \1\&{then}\5
$\\{bad}\K19$;\2\par
\fi
\M112\*. The operation of adding or subtracting \\{min\_quarterword} occurs
quite
frequently in \TeX, so it is convenient to abbreviate this operation
by using the macros \\{qi} and \\{qo} for input and output to and from
quarterword format.
The inner loop of \TeX\ will run faster with respect to compilers
that don't optimize expressions like `$\|x+0$' and `$\|x-0$', if these
macros are simplified in the obvious way when $\\{min\_quarterword}=0$.
So they have been simplified here in the obvious way.
\Y\P\D \37$\\{qi}(\#)\S0+\#+0$\C{to put an \\{eight\_bits} item into a
quarterword}\par
\P\D \37$\\{qo}(\#)\S0+\#-0$\C{to take an \\{eight\_bits} item from a
quarterword}\par
\P\D \37$\\{hi}(\#)\S0+\#+0$\C{to put a sixteen-bit item into a halfword}\par
\P\D \37$\\{ho}(\#)\S0+\#-0$\C{to take a sixteen-bit item from a halfword}\par
\fi
\M113. The reader should study the following definitions closely:
\Y\P\D \37$\\{sc}\S\\{int}$\C{\\{scaled} data is equivalent to \\{integer}}\par
\Y\P$\4\X18:Types in the outer block\X\mathrel{+}\S$\6
$\\{quarterword}=\\{min\_quarterword}\to\\{max\_quarterword}$;\C{1/4 of a word}%
\6
$\\{halfword}=\\{min\_halfword}\to\\{max\_halfword}$;\C{1/2 of a word}\6
$\\{two\_choices}=1\to2$;\C{used when there are two variants in a record}\6
$\\{four\_choices}=1\to4$;\C{used when there are four variants in a record}\6
$\\{two\_halves}=$\1\5
\&{packed} \37\1\&{record} \37\\{rh}: \37\\{halfword};\2\6
\&{case} $\\{two\_choices}$ \1\&{of}\6
\41: \37$(\\{lh}:\\{halfword})$;\6
\42: \37$(\\{b0}:\\{quarterword};\,\35\\{b1}:\\{quarterword})$;\2\6
\&{end};\2\6
$\\{four\_quarters}=$\1\5
\&{packed} \37\1\&{record} \37\\{b0}: \37\\{quarterword};\6
\4\\{b1}: \37\\{quarterword};\6
\4\\{b2}: \37\\{quarterword};\6
\4\\{b3}: \37\\{quarterword};\2\6
\&{end};\2\6
$\\{memory\_word}=$\1\5
\1\&{record} \37\2\6
\&{case} $\\{four\_choices}$ \1\&{of}\6
\41: \37$(\\{int}:\\{integer})$;\6
\42: \37$(\\{gr}:\\{glue\_ratio})$;\6
\43: \37$(\\{hh}:\\{two\_halves})$;\6
\44: \37$(\\{qqqq}:\\{four\_quarters})$;\2\6
\&{end};\2\6
$\\{word\_file}=$\1\5
\&{file} \1\&{of}\5
\\{memory\_word};\2\2\par
\fi
\M114. When debugging, we may want to print a \\{memory\_word} without knowing
what type it is; so we print it in all modes.
\Y\P\&{debug} \37\&{procedure}\1\ \37$\\{print\_word}(\|w:\\{memory\_word})$;%
\C{prints \|w in all ways}\2\6
\&{begin} \37$\\{print\_int}(\|w.\\{int})$;\5
$\\{print\_char}(\.{"\ "})$;\6
$\\{print\_scaled}(\|w.\\{sc})$;\5
$\\{print\_char}(\.{"\ "})$;\6
$\\{print\_scaled}(\\{round}(\\{unity}\ast\\{float}(\|w.\\{gr})))$;\5
\\{print\_ln};\6
$\\{print\_int}(\|w.\\{hh}.\\{lh})$;\5
$\\{print\_char}(\.{"="})$;\5
$\\{print\_int}(\|w.\\{hh}.\\{b0})$;\5
$\\{print\_char}(\.{":"})$;\5
$\\{print\_int}(\|w.\\{hh}.\\{b1})$;\5
$\\{print\_char}(\.{";"})$;\5
$\\{print\_int}(\|w.\\{hh}.\\{rh})$;\5
$\\{print\_char}(\.{"\ "})$;\6
$\\{print\_int}(\|w.\\{qqqq}.\\{b0})$;\5
$\\{print\_char}(\.{":"})$;\5
$\\{print\_int}(\|w.\\{qqqq}.\\{b1})$;\5
$\\{print\_char}(\.{":"})$;\5
$\\{print\_int}(\|w.\\{qqqq}.\\{b2})$;\5
$\\{print\_char}(\.{":"})$;\5
$\\{print\_int}(\|w.\\{qqqq}.\\{b3})$;\6
\&{end};\6
\&{gubed}\par
\fi
\N115. \[9] Dynamic memory allocation.
The \TeX\ system does nearly all of its own memory allocation, so that it
can readily be transported into environments that do not have automatic
facilities for strings, garbage collection, etc., and so that it can be in
control of what error messages the user receives. The dynamic storage
requirements of \TeX\ are handled by providing a large array \\{mem} in
which consecutive blocks of words are used as nodes by the \TeX\ routines.
Pointer variables are indices into this array, or into another array
called \\{eqtb} that will be explained later. A pointer variable might
also be a special flag that lies outside the bounds of \\{mem}, so we
allow pointers to assume any \\{halfword} value. The minimum halfword
value represents a null pointer. \TeX\ does not assume that $\\{mem}[\\{null}]$
exists.
\Y\P\D \37$\\{pointer}\S\\{halfword}$\C{a flag or a location in \\{mem} or %
\\{eqtb}}\par
\P\D \37$\\{null}\S\\{min\_halfword}$\C{the null pointer}\par
\Y\P$\4\X13:Global variables\X\mathrel{+}\S$\6
\4\\{temp\_ptr}: \37\\{pointer};\C{a pointer variable for occasional emergency
use}\par
\fi
\M116\*. The \\{mem} array is divided into two regions that are allocated
separately,
but the dividing line between these two regions is not fixed; they grow
together until finding their ``natural'' size in a particular job.
Locations less than or equal to \\{lo\_mem\_max} are used for storing
variable-length records consisting of two or more words each. This region
is maintained using an algorithm similar to the one described in exercise
2.5--19 of {\sl The Art of Computer Programming}. However, no size field
appears in the allocated nodes; the program is responsible for knowing the
relevant size when a node is freed. Locations greater than or equal to
\\{hi\_mem\_min} are used for storing one-word records; a conventional
\.{AVAIL} stack is used for allocation in this region.
Locations of \\{mem} between \\{mem\_bot} and \\{mem\_top} may be dumped as
part
of preloaded format files, by the \.{INITEX} preprocessor.
Production versions of \TeX\ may extend the memory at both ends in order to
provide more space; locations between \\{mem\_min} and \\{mem\_bot} are always
used for variable-size nodes, and locations between \\{mem\_top} and \\{mem%
\_max}
are always used for single-word nodes.
The key pointers that govern \\{mem} allocation have a prescribed order:
$$\hbox{$\\{null}\L\\{mem\_min}\L\\{mem\_bot}<\\{lo\_mem\_max}<\\{hi\_mem%
\_min}<\\{mem\_top}\L\\{mem\_end}\L\\{mem\_max}$.}$$
Empirical tests show that the present implementation of \TeX\ tends to
spend about 9\% of its running time allocating nodes, and about 6\%
deallocating them after their use.
\Y\P$\4\X13:Global variables\X\mathrel{+}\S$\6
\4\\{lo\_mem\_max}: \37\\{pointer};\C{the largest location of variable-size
memory in use}\6
\4\\{hi\_mem\_min}: \37\\{pointer};\C{the smallest location of one-word memory
in use}\par
\fi
\M117. In order to study the memory requirements of particular applications, it
is possible to prepare a version of \TeX\ that keeps track of current and
maximum memory usage. When code between the delimiters \&{stat} $\ldots$
\&{tats} is not ``commented out,'' \TeX\ will run a bit slower but it will
report these statistics when \\{tracing\_stats} is sufficiently large.
\Y\P$\4\X13:Global variables\X\mathrel{+}\S$\6
\4$\\{var\_used},\39\\{dyn\_used}$: \37\\{integer};\C{how much memory is in
use}\par
\fi
\M118. Let's consider the one-word memory region first, since it's the
simplest. The pointer variable \\{mem\_end} holds the highest-numbered location
of \\{mem} that has ever been used. The free locations of \\{mem} that
occur between \\{hi\_mem\_min} and \\{mem\_end}, inclusive, are of type
\\{two\_halves}, and we write $\\{info}(\|p)$ and $\\{link}(\|p)$ for the %
\\{lh}
and \\{rh} fields of $\\{mem}[\|p]$ when it is of this type. The single-word
free locations form a linked list
$$\\{avail},\;\hbox{$\\{link}(\\{avail})$},\;\hbox{$\\{link}(\\{link}(%
\\{avail}))$},\;\ldots$$
terminated by \\{null}.
\Y\P\D \37$\\{link}(\#)\S\\{mem}[\#].\\{hh}.\\{rh}$\C{the \\{link} field of a
memory word}\par
\P\D \37$\\{info}(\#)\S\\{mem}[\#].\\{hh}.\\{lh}$\C{the \\{info} field of a
memory word}\par
\Y\P$\4\X13:Global variables\X\mathrel{+}\S$\6
\4\\{avail}: \37\\{pointer};\C{head of the list of available one-word nodes}\6
\4\\{mem\_end}: \37\\{pointer};\C{the last one-word node used in \\{mem}}\par
\fi
\M119. If memory is exhausted, it might mean that the user has forgotten
a right brace. We will define some procedures later that try to help
pinpoint the trouble.
\Y\P\X292:Declare the procedure called \\{show\_token\_list}\X\6
\X306:Declare the procedure called \\{runaway}\X\par
\fi
\M120. The function \\{get\_avail} returns a pointer to a new one-word node
whose
\\{link} field is null. However, \TeX\ will halt if there is no more room left.
If the available-space list is empty, i.e., if $\\{avail}=\\{null}$,
we try first to increase \\{mem\_end}. If that cannot be done, i.e., if
$\\{mem\_end}=\\{mem\_max}$, we try to decrease \\{hi\_mem\_min}. If that
cannot be
done, i.e., if $\\{hi\_mem\_min}=\\{low\_mem\_max}+1$, we have to quit.
\Y\P\4\&{function}\1\ \37\\{get\_avail}: \37\\{pointer};\C{single-word node
allocation}\6
\4\&{var} \37\|p: \37\\{pointer};\C{the new node being got}\2\6
\&{begin} \37$\|p\K\\{avail}$;\C{get top location in the \\{avail} stack}\6
\&{if} $\|p\I\\{null}$ \1\&{then}\5
$\\{avail}\K\\{link}(\\{avail})$\C{and pop it off}\6
\4\&{else} \&{if} $\\{mem\_end}<\\{mem\_max}$ \1\&{then}\C{or go into virgin
territory}\6
\&{begin} \37$\\{incr}(\\{mem\_end})$;\5
$\|p\K\\{mem\_end}$;\6
\&{end}\6
\4\&{else} \&{begin} \37$\\{decr}(\\{hi\_mem\_min})$;\5
$\|p\K\\{hi\_mem\_min}$;\6
\&{if} $\\{hi\_mem\_min}\L\\{lo\_mem\_max}$ \1\&{then}\6
\&{begin} \37\\{runaway};\C{if memory is exhausted, display possible runaway
text}\6
$\\{overflow}(\.{"main\ memory\ size"},\39\\{mem\_max}+1-\\{mem\_min})$;%
\C{quit; all one-word nodes are busy}\6
\&{end};\2\6
\&{end};\2\2\6
$\\{link}(\|p)\K\\{null}$;\C{provide an oft-desired initialization of the new
node}\6
\&{stat} \37$\\{incr}(\\{dyn\_used})$;\ \&{tats}\C{maintain statistics}\6
$\\{get\_avail}\K\|p$;\6
\&{end};\par
\fi
\M121. Conversely, a one-word node is recycled by calling \\{free\_avail}.
This routine is part of \TeX's ``inner loop,'' so we want it to be fast.
\Y\P\D \37$\\{free\_avail}(\#)\S$\C{single-word node liberation}\6
\&{begin} \37$\\{link}(\#)\K\\{avail}$;\5
$\\{avail}\K\#$;\6
\&{stat} \37$\\{decr}(\\{dyn\_used})$;\ \&{tats}\6
\&{end}\par
\fi
\M122. There's also a \\{fast\_get\_avail} routine, which saves the
procedure-call
overhead at the expense of extra programming. This routine is used in
the places that would otherwise account for the most calls of \\{get\_avail}.
\Y\P\D \37$\\{fast\_get\_avail}(\#)\S\hbox{}$\6
\&{begin} \37$\#\K\\{avail}$;\C{avoid \\{get\_avail} if possible, to save time}%
\6
\&{if} $\#=\\{null}$ \1\&{then}\5
$\#\K\\{get\_avail}$\6
\4\&{else} \&{begin} \37$\\{avail}\K\\{link}(\#)$;\5
$\\{link}(\#)\K\\{null}$;\6
\&{stat} \37$\\{incr}(\\{dyn\_used})$;\ \&{tats}\6
\&{end};\2\6
\&{end}\par
\fi
\M123. The procedure $\\{flush\_list}(\|p)$ frees an entire linked list of
one-word nodes that starts at position \|p.
\Y\P\4\&{procedure}\1\ \37$\\{flush\_list}(\|p:\\{pointer})$;\C{makes list of
single-word nodes available}\6
\4\&{var} \37$\|q,\39\|r$: \37\\{pointer};\C{list traversers}\2\6
\&{begin} \37\&{if} $\|p\I\\{null}$ \1\&{then}\6
\&{begin} \37$\|r\K\|p$;\6
\1\&{repeat} \37$\|q\K\|r$;\5
$\|r\K\\{link}(\|r)$;\6
\&{stat} \37$\\{decr}(\\{dyn\_used})$;\ \&{tats}\6
\4\&{until}\5
$\|r=\\{null}$;\C{now \|q is the last node on the list}\2\6
$\\{link}(\|q)\K\\{avail}$;\5
$\\{avail}\K\|p$;\6
\&{end};\2\6
\&{end};\par
\fi
\M124. The available-space list that keeps track of the variable-size portion
of \\{mem} is a nonempty, doubly-linked circular list of empty nodes,
pointed to by the roving pointer \\{rover}.
Each empty node has size 2 or more; the first word contains the special
value \\{max\_halfword} in its \\{link} field and the size in its \\{info}
field;
the second word contains the two pointers for double linking.
Each nonempty node also has size 2 or more. Its first word is of type
\\{two\_halves}\kern-1pt, and its \\{link} field is never equal to \\{max%
\_halfword}.
Otherwise there is complete flexibility with respect to the contents
of its other fields and its other words.
(We require $\\{mem\_max}<\\{max\_halfword}$ because terrible things can happen
when \\{max\_halfword} appears in the \\{link} field of a nonempty node.)
\Y\P\D \37$\\{empty\_flag}\S\\{max\_halfword}$\C{the \\{link} of an empty
variable-size node}\par
\P\D \37$\\{is\_empty}(\#)\S(\\{link}(\#)=\\{empty\_flag})$\C{tests for empty
node}\par
\P\D \37$\\{node\_size}\S\\{info}$\C{the size field in empty variable-size
nodes}\par
\P\D \37$\\{llink}(\#)\S\\{info}(\#+1)$\C{left link in doubly-linked list of
empty nodes}\par
\P\D \37$\\{rlink}(\#)\S\\{link}(\#+1)$\C{right link in doubly-linked list of
empty nodes}\par
\Y\P$\4\X13:Global variables\X\mathrel{+}\S$\6
\4\\{rover}: \37\\{pointer};\C{points to some node in the list of empties}\par
\fi
\M125. A call to \\{get\_node} with argument \|s returns a pointer to a new
node
of size~\|s, which must be 2~or more. The \\{link} field of the first word
of this new node is set to null. An overflow stop occurs if no suitable
space exists.
If \\{get\_node} is called with $s=2↑{30}$, it simply merges adjacent free
areas and returns the value \\{max\_halfword}.
\Y\P\4\&{function}\1\ \37$\\{get\_node}(\|s:\\{integer})$: \37\\{pointer};%
\C{variable-size node allocation}\6
\4\&{label} \37$\\{found},\39\\{exit},\39\\{restart}$;\6
\4\&{var} \37\|p: \37\\{pointer};\C{the node currently under inspection}\6
\|q: \37\\{pointer};\C{the node physically after node \|p}\6
\|r: \37\\{integer};\C{the newly allocated node, or a candidate for this honor}%
\6
\|t: \37\\{integer};\C{temporary register}\2\6
\&{begin} \37\\{restart}: \37$\|p\K\\{rover}$;\C{start at some free node in the
ring}\6
\1\&{repeat} \37\X127:Try to allocate within node \|p and its physical
successors, and \&{goto} \\{found} if allocation was possible\X;\6
$\|p\K\\{rlink}(\|p)$;\C{move to the next node in the ring}\6
\4\&{until}\5
$\|p=\\{rover}$;\C{repeat until the whole list has been traversed}\2\6
\&{if} $\|s=\O{10000000000}$ \1\&{then}\6
\&{begin} \37$\\{get\_node}\K\\{max\_halfword}$;\5
\&{return};\6
\&{end};\2\6
\&{if} $\\{lo\_mem\_max}+2<\\{hi\_mem\_min}$ \1\&{then}\6
\&{if} $\\{lo\_mem\_max}+2\L\\{mem\_bot}+\\{max\_halfword}$ \1\&{then}\5
\X126:Grow more variable-size memory and \&{goto} \\{restart}\X;\2\2\6
$\\{overflow}(\.{"main\ memory\ size"},\39\\{mem\_max}+1-\\{mem\_min})$;%
\C{sorry, nothing satisfactory is left}\6
\4\\{found}: \37$\\{link}(\|r)\K\\{null}$;\C{this node is now nonempty}\6
\&{stat} \37$\\{var\_used}\K\\{var\_used}+\|s$;\C{maintain usage statistics}\6
\&{tats}\6
$\\{get\_node}\K\|r$;\6
\4\\{exit}: \37\&{end};\par
\fi
\M126. The lower part of \\{mem} grows by 1000 words at a time, unless
we are very close to going under. When it grows, we simply link
a new node into the available-space list. This method of controlled
growth helps to keep the \\{mem} usage consecutive when \TeX\ is
implemented on ``virtual memory'' systems.
\Y\P$\4\X126:Grow more variable-size memory and \&{goto} \\{restart}\X\S$\6
\&{begin} \37\&{if} $\\{lo\_mem\_max}+1000<\\{hi\_mem\_min}$ \1\&{then}\5
$\|t\K\\{lo\_mem\_max}+1000$\6
\4\&{else} $\|t\K(\\{lo\_mem\_max}+\\{hi\_mem\_min}+2)\mathbin{\&{div}}2$;\C{$%
\\{lo\_mem\_max}+2\L\|t<\\{hi\_mem\_min}$}\2\6
$\|p\K\\{llink}(\\{rover})$;\5
$\|q\K\\{lo\_mem\_max}$;\5
$\\{rlink}(\|p)\K\|q$;\5
$\\{llink}(\\{rover})\K\|q$;\6
\&{if} $\|t>\\{mem\_bot}+\\{max\_halfword}$ \1\&{then}\5
$\|t\K\\{mem\_bot}+\\{max\_halfword}$;\2\6
$\\{rlink}(\|q)\K\\{rover}$;\5
$\\{llink}(\|q)\K\|p$;\5
$\\{link}(\|q)\K\\{empty\_flag}$;\5
$\\{node\_size}(\|q)\K\|t-\\{lo\_mem\_max}$;\6
$\\{lo\_mem\_max}\K\|t$;\5
$\\{link}(\\{lo\_mem\_max})\K\\{null}$;\5
$\\{info}(\\{lo\_mem\_max})\K\\{null}$;\5
$\\{rover}\K\|q$;\5
\&{goto} \37\\{restart};\6
\&{end}\par
\U section~125.\fi
\M127. Empirical tests show that the routine in this section performs a
node-merging operation about 0.75 times per allocation, on the average,
after which it finds that $\|r>\|p+1$ about 95\% of the time.
\Y\P$\4\X127:Try to allocate within node \|p and its physical successors, and %
\&{goto} \\{found} if allocation was possible\X\S$\6
$\|q\K\|p+\\{node\_size}(\|p)$;\C{find the physical successor}\6
\&{while} $\\{is\_empty}(\|q)$ \1\&{do}\C{merge node \|p with node \|q}\6
\&{begin} \37$\|t\K\\{rlink}(\|q)$;\6
\&{if} $\|q=\\{rover}$ \1\&{then}\5
$\\{rover}\K\|t$;\2\6
$\\{llink}(\|t)\K\\{llink}(\|q)$;\5
$\\{rlink}(\\{llink}(\|q))\K\|t$;\6
$\|q\K\|q+\\{node\_size}(\|q)$;\6
\&{end};\2\6
$\|r\K\|q-\|s$;\6
\&{if} $\|r>\|p+1$ \1\&{then}\5
\X128:Allocate from the top of node \|p and \&{goto} \\{found}\X;\2\6
\&{if} $\|r=\|p$ \1\&{then}\6
\&{if} $\\{rlink}(\|p)\I\|p$ \1\&{then}\5
\X129:Allocate entire node \|p and \&{goto} \\{found}\X;\2\2\6
$\\{node\_size}(\|p)\K\|q-\|p$\C{reset the size in case it grew}\par
\U section~125.\fi
\M128. \P$\X128:Allocate from the top of node \|p and \&{goto} \\{found}\X\S$\6
\&{begin} \37$\\{node\_size}(\|p)\K\|r-\|p$;\C{store the remaining size}\6
$\\{rover}\K\|p$;\C{start searching here next time}\6
\&{goto} \37\\{found};\6
\&{end}\par
\U section~127.\fi
\M129. Here we delete node \|p from the ring, and let \\{rover} rove around.
\Y\P$\4\X129:Allocate entire node \|p and \&{goto} \\{found}\X\S$\6
\&{begin} \37$\\{rover}\K\\{rlink}(\|p)$;\5
$\|t\K\\{llink}(\|p)$;\5
$\\{llink}(\\{rover})\K\|t$;\5
$\\{rlink}(\|t)\K\\{rover}$;\5
\&{goto} \37\\{found};\6
\&{end}\par
\U section~127.\fi
\M130. Conversely, when some variable-size node \|p of size \|s is no longer
needed,
the operation $\\{free\_node}(\|p,\|s)$ will make its words available, by
inserting
\|p as a new empty node just before where \\{rover} now points.
\Y\P\4\&{procedure}\1\ \37$\\{free\_node}(\|p:\\{pointer};\,\35\|s:%
\\{halfword})$;\C{variable-size node liberation}\6
\4\&{var} \37\|q: \37\\{pointer};\C{$\\{llink}(\\{rover})$}\2\6
\&{begin} \37$\\{node\_size}(\|p)\K\|s$;\5
$\\{link}(\|p)\K\\{empty\_flag}$;\5
$\|q\K\\{llink}(\\{rover})$;\5
$\\{llink}(\|p)\K\|q$;\5
$\\{rlink}(\|p)\K\\{rover}$;\C{set both links}\6
$\\{llink}(\\{rover})\K\|p$;\5
$\\{rlink}(\|q)\K\|p$;\C{insert \|p into the ring}\6
\&{stat} \37$\\{var\_used}\K\\{var\_used}-\|s$;\ \&{tats}\C{maintain
statistics}\6
\&{end};\par
\fi
\M131. Just before \.{INITEX} writes out the memory, it sorts the doubly linked
available space list. The list is probably very short at such times, so a
simple insertion sort is used. The smallest available location will be
pointed to by \\{rover}, the next-smallest by $\\{rlink}(\\{rover})$, etc.
\Y\P\&{init} \37\&{procedure}\1\ \37\\{sort\_avail};\C{sorts the available
variable-size nodes by location}\6
\4\&{var} \37$\|p,\39\|q,\39\|r$: \37\\{pointer};\C{indices into \\{mem}}\6
\\{old\_rover}: \37\\{pointer};\C{initial \\{rover} setting}\2\6
\&{begin} \37$\|p\K\\{get\_node}(\O{10000000000})$;\C{merge adjacent free
areas}\6
$\|p\K\\{rlink}(\\{rover})$;\5
$\\{rlink}(\\{rover})\K\\{max\_halfword}$;\5
$\\{old\_rover}\K\\{rover}$;\6
\&{while} $\|p\I\\{old\_rover}$ \1\&{do}\5
\X132:Sort \|p into the list starting at \\{rover} and advance \|p to $%
\\{rlink}(\|p)$\X;\2\6
$\|p\K\\{rover}$;\6
\&{while} $\\{rlink}(\|p)\I\\{max\_halfword}$ \1\&{do}\6
\&{begin} \37$\\{llink}(\\{rlink}(\|p))\K\|p$;\5
$\|p\K\\{rlink}(\|p)$;\6
\&{end};\2\6
$\\{rlink}(\|p)\K\\{rover}$;\5
$\\{llink}(\\{rover})\K\|p$;\6
\&{end};\6
\&{tini}\par
\fi
\M132. The following \&{while} loop terminates, since the list that starts at
\\{rover} ends with \\{max\_halfword} during the sorting procedure.
\Y\P$\4\X132:Sort \|p into the list starting at \\{rover} and advance \|p to $%
\\{rlink}(\|p)$\X\S$\6
\&{if} $\|p<\\{rover}$ \1\&{then}\6
\&{begin} \37$\|q\K\|p$;\5
$\|p\K\\{rlink}(\|q)$;\5
$\\{rlink}(\|q)\K\\{rover}$;\5
$\\{rover}\K\|q$;\6
\&{end}\6
\4\&{else} \&{begin} \37$\|q\K\\{rover}$;\6
\&{while} $\\{rlink}(\|q)<\|p$ \1\&{do}\5
$\|q\K\\{rlink}(\|q)$;\2\6
$\|r\K\\{rlink}(\|p)$;\5
$\\{rlink}(\|p)\K\\{rlink}(\|q)$;\5
$\\{rlink}(\|q)\K\|p$;\5
$\|p\K\|r$;\6
\&{end}\2\par
\U section~131.\fi
\N133. \[10] Data structures for boxes and their friends.
From the computer's standpoint, \TeX's chief mission is to create
horizontal and vertical lists. We shall now investigate how the elements
of these lists are represented internally as nodes in the dynamic memory.
A horizontal or vertical list is linked together by \\{link} fields in
the first word of each node. Individual nodes represent boxes, glue,
penalties, or special things like discretionary hyphens; because of this
variety, some nodes are longer than others, and we must distinguish different
kinds of nodes. We do this by putting a `\\{type}' field in the first word,
together with the link and an optional `\\{subtype}'.
\Y\P\D \37$\\{type}(\#)\S\\{mem}[\#].\\{hh}.\\{b0}$\C{identifies what kind of
node this is}\par
\P\D \37$\\{subtype}(\#)\S\\{mem}[\#].\\{hh}.\\{b1}$\C{secondary identification
in some cases}\par
\fi
\M134. A \\{char\_node}, which represents a single character, is the most
important
kind of node because it accounts for the vast majority of all boxes.
Special precautions are therefore taken to ensure that a \\{char\_node} does
not take up much memory space. Every such node is one word long, and in fact
it is identifiable by this property, since other kinds of nodes have at least
two words, and they appear in \\{mem} locations less than \\{hi\_mem\_min}.
This makes it possible to omit the \\{type} field in a \\{char\_node}, leaving
us room for two bytes that identify a \\{font} and a \\{character} within
that font.
Note that the format of a \\{char\_node} allows for up to 256 different
fonts and up to 256 characters per font; but most implementations will
probably limit the total number of fonts to fewer than 75 per job,
and most fonts will stick to characters less than 128 (since higher codes
are accessed outside of math mode only via ligatures and the \.{\\char}
operator).
Extensions of \TeX\ intended for oriental languages will need even more
than $256\times256$ possible characters, when we consider different sizes
and styles of type. It is suggested that Chinese and Japanese fonts be
handled by representing such characters in two consecutive \\{char\_node}
entries: The first of these has $\\{font}=\\{font\_base}$ and links to the
second,
while the second identifies the font and the character dimensions.
The saving feature about oriental characters is that most of them have
the same box dimensions. The \\{character} field of the first \\{char\_node}
is an ``extension'' that distinguishes between graphic symbols whose
dimensions are identical for typesetting purposes. Such an extension of
\TeX\ would not be difficult; further details are left to the reader.
In order to make sure that the \\{character} code fits in a quarterword,
\TeX\ adds the quantity \\{min\_quarterword} to the actual code.
Character nodes appear only in horizontal lists, never in vertical lists.
\Y\P\D \37$\\{is\_char\_node}(\#)\S(\#\G\\{hi\_mem\_min})$\C{does the argument
point to a \\{char\_node}?}\par
\P\D \37$\\{font}\S\\{type}$\C{the font code in a \\{char\_node}}\par
\P\D \37$\\{character}\S\\{subtype}$\C{the character code in a \\{char\_node}}%
\par
\fi
\M135. An \\{hlist\_node} stands for a box that was made from a horizontal
list.
Each \\{hlist\_node} is seven words long, and contains the following fields
(in addition to the mandatory \\{type} and \\{link}, which we shall not
mention explicitly when discussing the other node types): The \\{height} and
\\{width} and \\{depth} are scaled integers denoting the dimensions of the
box. There is also a \\{shift\_amount} field, a scaled integer indicating
how much this box should be lowered (if it appears in a horizontal list),
or how much it should be moved to the right (if it appears in a vertical
list). There is a \\{list\_ptr} field, which points to the beginning of the
list from which this box was fabricated; if \\{list\_ptr} is \\{null}, the box
is empty. Finally, there are three fields that represent the setting of
the glue: $\\{glue\_set}(\|p)$ is a word of type \\{glue\_ratio} that
represents
the proportionality constant for glue setting; $\\{glue\_sign}(\|p)$ is
\\{stretching} or \\{shrinking} or \\{normal} depending on whether or not the
glue should stretch or shrink or remain rigid; and $\\{glue\_order}(\|p)$
specifies the order of infinity to which glue setting applies (\\{normal},
\\{fil}, \\{fill}, or \\{filll}). The \\{subtype} field is not used.
\Y\P\D \37$\\{hlist\_node}=0$\C{\\{type} of hlist nodes}\par
\P\D \37$\\{box\_node\_size}=7$\C{number of words to allocate for a box node}%
\par
\P\D \37$\\{width\_offset}=1$\C{position of \\{width} field in a box node}\par
\P\D \37$\\{depth\_offset}=2$\C{position of \\{depth} field in a box node}\par
\P\D \37$\\{height\_offset}=3$\C{position of \\{height} field in a box node}\par
\P\D \37$\\{width}(\#)\S\\{mem}[\#+\\{width\_offset}].\\{sc}$\C{width of the
box, in sp}\par
\P\D \37$\\{depth}(\#)\S\\{mem}[\#+\\{depth\_offset}].\\{sc}$\C{depth of the
box, in sp}\par
\P\D \37$\\{height}(\#)\S\\{mem}[\#+\\{height\_offset}].\\{sc}$\C{height of the
box, in sp}\par
\P\D \37$\\{shift\_amount}(\#)\S\\{mem}[\#+4].\\{sc}$\C{repositioning distance,
in sp}\par
\P\D \37$\\{list\_offset}=5$\C{position of \\{list\_ptr} field in a box node}%
\par
\P\D \37$\\{list\_ptr}(\#)\S\\{link}(\#+\\{list\_offset})$\C{beginning of the
list inside the box}\par
\P\D \37$\\{glue\_order}(\#)\S\\{subtype}(\#+\\{list\_offset})$\C{applicable
order of infinity}\par
\P\D \37$\\{glue\_sign}(\#)\S\\{type}(\#+\\{list\_offset})$\C{stretching or
shrinking}\par
\P\D \37$\\{normal}=0$\C{the most common case when several cases are named}\par
\P\D \37$\\{stretching}=1$\C{glue setting applies to the stretch components}\par
\P\D \37$\\{shrinking}=2$\C{glue setting applies to the shrink components}\par
\P\D \37$\\{glue\_offset}=6$\C{position of \\{glue\_set} in a box node}\par
\P\D \37$\\{glue\_set}(\#)\S\\{mem}[\#+\\{glue\_offset}].\\{gr}$\C{a word of
type \\{glue\_ratio} for glue setting}\par
\fi
\M136. The \\{new\_null\_box} function returns a pointer to an \\{hlist\_node}
in
which all subfields have the values corresponding to `\.{\\hbox\{\}}'.
The \\{subtype} field is set to \\{min\_quarterword}, since that's the desired
\\{span\_count} value if this \\{hlist\_node} is changed to an \\{unset\_node}.
\Y\P\4\&{function}\1\ \37\\{new\_null\_box}: \37\\{pointer};\C{creates a new
box node}\6
\4\&{var} \37\|p: \37\\{pointer};\C{the new node}\2\6
\&{begin} \37$\|p\K\\{get\_node}(\\{box\_node\_size})$;\5
$\\{type}(\|p)\K\\{hlist\_node}$;\5
$\\{subtype}(\|p)\K\\{min\_quarterword}$;\5
$\\{width}(\|p)\K0$;\5
$\\{depth}(\|p)\K0$;\5
$\\{height}(\|p)\K0$;\5
$\\{shift\_amount}(\|p)\K0$;\5
$\\{list\_ptr}(\|p)\K\\{null}$;\5
$\\{glue\_sign}(\|p)\K\\{normal}$;\5
$\\{glue\_order}(\|p)\K\\{normal}$;\5
$\\{set\_glue\_ratio\_zero}(\\{glue\_set}(\|p))$;\5
$\\{new\_null\_box}\K\|p$;\6
\&{end};\par
\fi
\M137. A \\{vlist\_node} is like an \\{hlist\_node} in all respects except that
it
was made from a vertical list.
\Y\P\D \37$\\{vlist\_node}=1$\C{\\{type} of vlist nodes}\par
\fi
\M138. A \\{rule\_node} stands for a solid black rectangle; it has \\{width},
\\{depth}, and \\{height} fields just as in an \\{hlist\_node}. However, if
any of these dimensions is $-2↑{30}$, the actual value will be determined
by running the rule up to the boundary of the innermost enclosing box.
This is called a ``running dimension.'' The \\{width} is never running in
an hlist; the \\{height} and \\{depth} are never running in a vlist.
\Y\P\D \37$\\{rule\_node}=2$\C{\\{type} of rule nodes}\par
\P\D \37$\\{rule\_node\_size}=4$\C{number of words to allocate for a rule node}%
\par
\P\D \37$\\{null\_flag}\S-\O{10000000000}$\C{$-2↑{30}$, signifies a missing
item}\par
\P\D \37$\\{is\_running}(\#)\S(\#=\\{null\_flag})$\C{tests for a running
dimension}\par
\fi
\M139. A new rule node is delivered by the \\{new\_rule} function. It
makes all the dimensions ``running,'' so you have to change the
ones that are not allowed to run.
\Y\P\4\&{function}\1\ \37\\{new\_rule}: \37\\{pointer};\6
\4\&{var} \37\|p: \37\\{pointer};\C{the new node}\2\6
\&{begin} \37$\|p\K\\{get\_node}(\\{rule\_node\_size})$;\5
$\\{type}(\|p)\K\\{rule\_node}$;\5
$\\{subtype}(\|p)\K0$;\C{the \\{subtype} is not used}\6
$\\{width}(\|p)\K\\{null\_flag}$;\5
$\\{depth}(\|p)\K\\{null\_flag}$;\5
$\\{height}(\|p)\K\\{null\_flag}$;\5
$\\{new\_rule}\K\|p$;\6
\&{end};\par
\fi
\M140. Insertions are represented by \\{ins\_node} records, where the %
\\{subtype}
indicates the corresponding box number. For example, `\.{\\insert 250}'
leads to an \\{ins\_node} whose \\{subtype} is $250+\\{min\_quarterword}$.
The \\{height} field of an \\{ins\_node} is slightly misnamed; it actually
holds
the natural height plus depth of the vertical list being inserted.
The \\{depth} field holds the \\{split\_max\_depth} to be used in case this
insertion is split, and the \\{split\_top\_ptr} points to the corresponding
\\{split\_top\_skip}. The \\{float\_cost} field holds the \\{floating\_penalty}
that
will be used if this insertion floats to a subsequent page after a
split insertion of the same class. There is one more field, the
\\{ins\_ptr}, which points to the beginning of the vlist for the insertion.
\Y\P\D \37$\\{ins\_node}=3$\C{\\{type} of insertion nodes}\par
\P\D \37$\\{ins\_node\_size}=5$\C{number of words to allocate for an insertion}%
\par
\P\D \37$\\{float\_cost}(\#)\S\\{mem}[\#+1].\\{int}$\C{the \\{floating%
\_penalty} to be used}\par
\P\D \37$\\{ins\_ptr}(\#)\S\\{info}(\#+4)$\C{the vertical list to be inserted}%
\par
\P\D \37$\\{split\_top\_ptr}(\#)\S\\{link}(\#+4)$\C{the \\{split\_top\_skip} to
be used}\par
\fi
\M141. A \\{mark\_node} has a \\{mark\_ptr} field that points to the reference
count
of a token list that contains the user's \.{\\mark} text.
This field occupies a full word instead of a halfword, because
there's nothing to put in the other halfword; it is easier in \PASCAL\ to
use the full word than to risk leaving garbage in the unused half.
\Y\P\D \37$\\{mark\_node}=4$\C{\\{type} of a mark node}\par
\P\D \37$\\{small\_node\_size}=2$\C{number of words to allocate for most node
types}\par
\P\D \37$\\{mark\_ptr}(\#)\S\\{mem}[\#+1].\\{int}$\C{head of the token list for
a mark}\par
\fi
\M142. An \\{adjust\_node}, which occurs only in horizontal lists,
specifies material that will be moved out into the surrounding
vertical list; i.e., it is used to implement \TeX's `\.{\\vadjust}'
operation. The \\{adjust\_ptr} field points to the vlist containing this
material.
\Y\P\D \37$\\{adjust\_node}=5$\C{\\{type} of an adjust node}\par
\P\D \37$\\{adjust\_ptr}\S\\{mark\_ptr}$\C{vertical list to be moved out of
horizontal list}\par
\fi
\M143. A \\{ligature\_node}, which occurs only in horizontal lists, specifies a
composite character that was formed from two or more actual characters.
The second word of the node, which is called the \\{lig\_char} word, contains
\\{font} and \\{character} fields just as in a \\{char\_node}. The characters
that generated the ligature have not been forgotten, since they are needed
for diagnostic messages and for hyphenation; the \\{lig\_ptr} field points to
a linked list of character nodes for those characters.
\Y\P\D \37$\\{ligature\_node}=6$\C{\\{type} of a ligature node}\par
\P\D \37$\\{lig\_char}(\#)\S\#+1$\C{the word where the ligature is to be found}%
\par
\P\D \37$\\{lig\_ptr}(\#)\S\\{link}(\\{lig\_char}(\#))$\C{the list of
characters}\par
\fi
\M144. The \\{new\_ligature} function creates a ligature node having given
contents of the \\{font}, \\{character}, and \\{lig\_ptr} fields.
\Y\P\4\&{function}\1\ \37$\\{new\_ligature}(\|f,\39\|c:\\{quarterword};\,\35%
\|q:\\{pointer})$: \37\\{pointer};\6
\4\&{var} \37\|p: \37\\{pointer};\C{the new node}\2\6
\&{begin} \37$\|p\K\\{get\_node}(\\{small\_node\_size})$;\5
$\\{type}(\|p)\K\\{ligature\_node}$;\5
$\\{subtype}(\|p)\K0$;\C{the \\{subtype} is not used}\6
$\\{font}(\\{lig\_char}(\|p))\K\|f$;\5
$\\{character}(\\{lig\_char}(\|p))\K\|c$;\5
$\\{lig\_ptr}(\|p)\K\|q$;\5
$\\{new\_ligature}\K\|p$;\6
\&{end};\par
\fi
\M145. A \\{disc\_node}, which occurs only in horizontal lists, specifies a
``dis\-cretion\-ary'' line break. If such a break occurs at node \|p, the text
that starts at $\\{pre\_break}(\|p)$ will precede the break, the text that
starts at
$\\{post\_break}(\|p)$ will follow the break, and text that appears in the next
$\\{replace\_count}(\|p)$ nodes will be ignored. For example, an ordinary
discretionary hyphen, indicated by `\.{\\-}', yields a \\{disc\_node} with
\\{pre\_break} pointing to a \\{char\_node} containing a hyphen, $\\{post%
\_break}=\\{null}$,
and $\\{replace\_count}=0$. All three of the discretionary texts must be
lists that consist entirely of character, kern, box, rule, and ligature nodes.
If $\\{pre\_break}(\|p)=\\{null}$, the \\{ex\_hyphen\_penalty} will be charged
for this
break. Otherwise the \\{hyphen\_penalty} will be charged. The texts will
actually be substituted into the list by the line-breaking algorithm if it
decides to make the break, and the discretionary node will disappear at
that time; thus, the output routine sees only discretionaries that were
not chosen.
\Y\P\D \37$\\{disc\_node}=7$\C{\\{type} of a discretionary node}\par
\P\D \37$\\{replace\_count}\S\\{subtype}$\C{how many subsequent nodes to
replace}\par
\P\D \37$\\{pre\_break}\S\\{llink}$\C{text that precedes a discretionary break}%
\par
\P\D \37$\\{post\_break}\S\\{rlink}$\C{text that follows a discretionary break}%
\par
\Y\P\4\&{function}\1\ \37\\{new\_disc}: \37\\{pointer};\C{creates an empty %
\\{disc\_node}}\6
\4\&{var} \37\|p: \37\\{pointer};\C{the new node}\2\6
\&{begin} \37$\|p\K\\{get\_node}(\\{small\_node\_size})$;\5
$\\{type}(\|p)\K\\{disc\_node}$;\5
$\\{replace\_count}(\|p)\K0$;\5
$\\{pre\_break}(\|p)\K\\{null}$;\5
$\\{post\_break}(\|p)\K\\{null}$;\5
$\\{new\_disc}\K\|p$;\6
\&{end};\par
\fi
\M146. A \\{whatsit\_node} is a wild card reserved for extensions to \TeX. The
\\{subtype} field in its first word says what `\\{whatsit}' it is, and
implicitly determines the node size (which must be 2 or more) and the
format of the remaining words. When a \\{whatsit\_node} is encountered
in a list, special actions are invoked; knowledgeable people who are
careful not to mess up the rest of \TeX\ are able to make \TeX\ do new
things by adding code at the end of the program. For example, there
might be a `\TeX nicolor' extension to specify different colors of ink,
and the whatsit node might contain the desired parameters.
The present implementation of \TeX\ treats the features associated with
`\.{\\write}' and `\.{\\special}' as if they were extensions, in order to
illustrate how such routines might be coded. We shall defer further
discussion of extensions until the end of this program.
\Y\P\D \37$\\{whatsit\_node}=8$\C{\\{type} of special extension nodes}\par
\fi
\M147. A \\{math\_node}, which occurs only in horizontal lists, appears before
and
after mathematical formulas. The \\{subtype} field is \\{before} before the
formula and \\{after} after it. There is a \\{width} field, which represents
the amount of surrounding space inserted by \.{\\mathsurround}.
\Y\P\D \37$\\{math\_node}=9$\C{\\{type} of a math node}\par
\P\D \37$\\{before}=0$\C{\\{subtype} for math node that introduces a formula}%
\par
\P\D \37$\\{after}=1$\C{\\{subtype} for math node that winds up a formula}\par
\Y\P\4\&{function}\1\ \37$\\{new\_math}(\|w:\\{scaled};\,\35\|s:\\{small%
\_number})$: \37\\{pointer};\6
\4\&{var} \37\|p: \37\\{pointer};\C{the new node}\2\6
\&{begin} \37$\|p\K\\{get\_node}(\\{small\_node\_size})$;\5
$\\{type}(\|p)\K\\{math\_node}$;\5
$\\{subtype}(\|p)\K\|s$;\5
$\\{width}(\|p)\K\|w$;\5
$\\{new\_math}\K\|p$;\6
\&{end};\par
\fi
\M148. \TeX\ makes use of the fact that \\{hlist\_node}, \\{vlist\_node},
\\{rule\_node}, \\{ins\_node}, \\{mark\_node}, \\{adjust\_node}, \\{ligature%
\_node},
\\{disc\_node}, \\{whatsit\_node}, and \\{math\_node} are at the low end of the
type codes, by permitting a break at glue in a list if and only if the
\\{type} of the previous node is less than \\{math\_node}. Furthermore, a
node is discarded after a break if its type is \\{math\_node} or more.
\Y\P\D \37$\\{precedes\_break}(\#)\S(\\{type}(\#)<\\{math\_node})$\par
\P\D \37$\\{non\_discardable}(\#)\S(\\{type}(\#)<\\{math\_node})$\par
\fi
\M149. A \\{glue\_node} represents glue in a list. However, it is really only
a pointer to a separate glue specification, since \TeX\ makes use of the
fact that many essentially identical nodes of glue are usually present.
If \|p points to a \\{glue\_node}, $\\{glue\_ptr}(\|p)$ points to
another packet of words that specify the stretch and shrink components, etc.
Glue nodes also serve to represent leaders; the \\{subtype} is used to
distinguish between ordinary glue (which is called \\{normal}) and the three
kinds of leaders (which are called \\{a\_leaders}, \\{c\_leaders}, and \\{x%
\_leaders}).
The \\{leader\_ptr} field points to a rule node or to a box node containing the
leaders; it is set to \\{null} in ordinary glue nodes.
Many kinds of glue are computed from \TeX's ``skip'' parameters, and
it is helpful to know which parameter has led to a particular glue node.
Therefore the \\{subtype} is set to indicate the source of glue, whenever
it originated as a parameter. We will be defining symbolic names for the
parameter numbers later (e.g., $\\{line\_skip\_code}=0$, $\\{baseline\_skip%
\_code}=1$,
etc.); it suffices for now to say that the \\{subtype} of parametric glue
will be the same as the parameter number, plus one.
In math formulas there are two more possibilities for the \\{subtype} in a
glue node: \\{mu\_glue} denotes an \.{\\mskip} (where the units are scaled %
\.{mu}
instead of scaled \.{pt}); and \\{cond\_math\_glue} denotes the `\.{%
\\nonscript}'
feature that cancels the glue node immediately following if it appears
in a subscript.
\Y\P\D \37$\\{glue\_node}=10$\C{\\{type} of node that points to a glue
specification}\par
\P\D \37$\\{cond\_math\_glue}=98$\C{special \\{subtype} to suppress glue in the
next node}\par
\P\D \37$\\{mu\_glue}=99$\C{\\{subtype} for math glue}\par
\P\D \37$\\{a\_leaders}=100$\C{\\{subtype} for aligned leaders}\par
\P\D \37$\\{c\_leaders}=101$\C{\\{subtype} for centered leaders}\par
\P\D \37$\\{x\_leaders}=102$\C{\\{subtype} for expanded leaders}\par
\P\D \37$\\{glue\_ptr}\S\\{llink}$\C{pointer to a glue specification}\par
\P\D \37$\\{leader\_ptr}\S\\{rlink}$\C{pointer to box or rule node for leaders}%
\par
\fi
\M150. A glue specification has a halfword reference count in its first word,
representing \\{null} plus the number of glue nodes that point to it (less
one).
Note that the reference count appears in the same position as
the \\{link} field in list nodes; this is the field that is initialized
to \\{null} when a node is allocated, and it is also the field that is flagged
by \\{empty\_flag} in empty nodes.
Glue specifications also contain three \\{scaled} fields, for the \\{width},
\\{stretch}, and \\{shrink} dimensions. Finally, there are two one-byte
fields called \\{stretch\_order} and \\{shrink\_order}; these contain the
orders of infinity (\\{normal}, \\{fil}, \\{fill}, or \\{filll})
corresponding to the stretch and shrink values.
\Y\P\D \37$\\{glue\_spec\_size}=4$\C{number of words to allocate for a glue
specification}\par
\P\D \37$\\{glue\_ref\_count}(\#)\S\\{link}(\#)$\C{reference count of a glue
specification}\par
\P\D \37$\\{stretch}(\#)\S\\{mem}[\#+2].\\{sc}$\C{the stretchability of this
glob of glue}\par
\P\D \37$\\{shrink}(\#)\S\\{mem}[\#+3].\\{sc}$\C{the shrinkability of this glob
of glue}\par
\P\D \37$\\{stretch\_order}\S\\{type}$\C{order of infinity for stretching}\par
\P\D \37$\\{shrink\_order}\S\\{subtype}$\C{order of infinity for shrinking}\par
\P\D \37$\\{fil}=1$\C{first-order infinity}\par
\P\D \37$\\{fill}=2$\C{second-order infinity}\par
\P\D \37$\\{filll}=3$\C{third-order infinity}\par
\Y\P$\4\X18:Types in the outer block\X\mathrel{+}\S$\6
$\\{glue\_ord}=\\{normal}\to\\{filll}$;\C{infinity to the 0, 1, 2, or 3 power}%
\par
\fi
\M151. Here is a function that returns a pointer to a copy of a glue spec.
The reference count in the copy is \\{null}, because there is assumed
to be exactly one reference to the new specification.
\Y\P\4\&{function}\1\ \37$\\{new\_spec}(\|p:\\{pointer})$: \37\\{pointer};%
\C{duplicates a glue specification}\6
\4\&{var} \37\|q: \37\\{pointer};\C{the new spec}\2\6
\&{begin} \37$\|q\K\\{get\_node}(\\{glue\_spec\_size})$;\6
$\\{mem}[\|q]\K\\{mem}[\|p]$;\5
$\\{glue\_ref\_count}(\|q)\K\\{null}$;\6
$\\{width}(\|q)\K\\{width}(\|p)$;\5
$\\{stretch}(\|q)\K\\{stretch}(\|p)$;\5
$\\{shrink}(\|q)\K\\{shrink}(\|p)$;\5
$\\{new\_spec}\K\|q$;\6
\&{end};\par
\fi
\M152. And here's a function that creates a glue node for a given parameter
identified by its code number; for example,
$\\{new\_param\_glue}(\\{line\_skip\_code})$ returns a pointer to a glue node
for the
current \.{\\lineskip}.
\Y\P\4\&{function}\1\ \37$\\{new\_param\_glue}(\|n:\\{small\_number})$: \37%
\\{pointer};\6
\4\&{var} \37\|p: \37\\{pointer};\C{the new node}\6
\|q: \37\\{pointer};\C{the glue specification}\2\6
\&{begin} \37$\|p\K\\{get\_node}(\\{small\_node\_size})$;\5
$\\{type}(\|p)\K\\{glue\_node}$;\5
$\\{subtype}(\|p)\K\|n+1$;\5
$\\{leader\_ptr}(\|p)\K\\{null}$;\6
$\|q\K\X224:Current \\{mem} equivalent of glue parameter number \|n\X\hbox{}$;\5
$\\{glue\_ptr}(\|p)\K\|q$;\5
$\\{incr}(\\{glue\_ref\_count}(\|q))$;\5
$\\{new\_param\_glue}\K\|p$;\6
\&{end};\par
\fi
\M153. Glue nodes that are more or less anonymous are created by \\{new\_glue},
whose argument points to a glue specification.
\Y\P\4\&{function}\1\ \37$\\{new\_glue}(\|q:\\{pointer})$: \37\\{pointer};\6
\4\&{var} \37\|p: \37\\{pointer};\C{the new node}\2\6
\&{begin} \37$\|p\K\\{get\_node}(\\{small\_node\_size})$;\5
$\\{type}(\|p)\K\\{glue\_node}$;\5
$\\{subtype}(\|p)\K\\{normal}$;\5
$\\{leader\_ptr}(\|p)\K\\{null}$;\5
$\\{glue\_ptr}(\|p)\K\|q$;\5
$\\{incr}(\\{glue\_ref\_count}(\|q))$;\5
$\\{new\_glue}\K\|p$;\6
\&{end};\par
\fi
\M154. Still another subroutine is needed: this one is sort of a combination
of \\{new\_param\_glue} and \\{new\_glue}. It creates a glue node for one of
the current glue parameters, but it makes a fresh copy of the glue
specification, since that specification will probably be subject to change,
while the parameter will stay put. The global variable \\{temp\_ptr} is
set to the address of the new spec.
\Y\P\4\&{function}\1\ \37$\\{new\_skip\_param}(\|n:\\{small\_number})$: \37%
\\{pointer};\6
\4\&{var} \37\|p: \37\\{pointer};\C{the new node}\2\6
\&{begin} \37$\\{temp\_ptr}\K\\{new\_spec}(\X224:Current \\{mem} equivalent of